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Preface 


This book develops the ideas behind and properties of wavelets and shows 
how they can be used as analytical tools for signal processing, numerical 
analysis, and mathematical modeling. We try to present this in a way that is 
accessible to the engineer, scientist, and applied mathematician both as a 
theoretical approach and as a potentially practical method to solve 
problems. Although the roots of this subject go back some time, the modern 
interest and development have a history of only a few decades. 


The early work was in the 1980's by Morlet, Grossmann, Meyer, Mallat, 
and others, but it was the paper by Ingrid Daubechies [link] in 1988 that 
caught the attention of the larger applied mathematics communities in 
signal processing, statistics, and numerical analysis. Much of the early work 
took place in France [link], [link] and the USA [link], [link], [link], [link]. 
As in many new disciplines, the first work was closely tied to a particular 
application or traditional theoretical framework. Now we are seeing the 
theory abstracted from application and developed on its own and seeing it 
related to other parallel ideas. Our own background and interests in signal 
processing certainly influence the presentation of this book. 


The goal of most modern wavelet research is to create a set of basis 
functions (or general expansion functions) and transforms that will give an 
informative, efficient, and useful description of a function or signal and 
allow more effective and efficient processing. If the signal is represented as 
a function of time, wavelets provide efficient localization in both time and 
frequency or scale. Another central idea is that of multiresolution where the 
decomposition of a signal is in terms of the resolution of detail. 


For the Fourier series, sinusoids are chosen as basis functions, then the 
properties of the resulting expansion are examined. For wavelet analysis, 
one poses the desired properties and then derives the resulting basis 
functions. An important property of the wavelet basis is providing a 
multiresolution analysis. For several reasons, it is often desired to have the 
basis functions orthonormal. Given these goals, you will see aspects of 
correlation techniques, Fourier transforms, short-time Fourier transforms, 
discrete Fourier transforms, Wigner distributions, filter banks, subband 
coding, and other signal expansion and processing methods in the results. 


Wavelet-based analysis is an exciting new problem-solving tool for the 
mathematician, scientist, and engineer. It fits naturally with the digital 
computer with its basis functions defined by summations not integrals or 
derivatives. Unlike most traditional expansion systems, the basis functions 
of the wavelet analysis are not solutions of differential equations. In some 
areas, it is the first truly new tool we have had in many years. Indeed, use of 
wavelets and wavelet transforms requires a new point of view and a new 
method of interpreting representations that we are still learning how to 
exploit. 


Work by Donoho, Johnstone, Coifman, and others have added theoretical 
reasons for why wavelet analysis is so versatile and powerful, and have 
given generalizations that are still being worked on. They have shown that 
wavelet systems have some inherent generic advantages and are near 
optimal for a wide class of problems [link]. They also show that adaptive 
means can create special wavelet systems for particular signals and classes 
of signals. 


The multiresolution decomposition seems to separate components of a 
signal in a way that is superior to most other methods for analysis, 
processing, or compression. Because of the ability of the discrete wavelet 
transform to decompose a signal at different independent scales and to do it 
in a very flexible way, Burke calls wavelets “The Mathematical 
Microscope" [link], [link]. Because of this powerful and flexible 
decomposition, linear and nonlinear processing of signals in the wavelet 
transform domain offers new methods for signal detection, filtering, and 
compression [link], [link], [link], [link], [link], [link]. It also can be used as 
the basis for robust numerical algorithms. 


You will also see an interesting connection and equivalence to filter bank 
theory from digital signal processing [link], [link]. Indeed, some of the 
results obtained with filter banks are the same as with discrete-time 
wavelets, and this has been developed in the signal processing community 
by Vetterli, Vaidyanathan, Smith and Barnwell, and others. Filter banks, as 
well as most algorithms for calculating wavelet transforms, are part of a still 
more general area of multirate and time-varying systems. 


The presentation here will be as a tutorial or primer for people who know 
little or nothing about wavelets but do have a technical background. It 
assumes a knowledge of Fourier series and transforms and of linear algebra 
and matrix theory. It also assumes a background equivalent to a B.S. degree 
in engineering, science, or applied mathematics. Some knowledge of signal 
processing is helpful but not essential. We develop the ideas in terms of 
one-dimensional signals [link] modeled as real or perhaps complex 
functions of time, but the ideas and methods have also proven effective in 
image representation and processing [link], [link] dealing with two, three, 
or even four or more dimensions. Vector spaces have proved to be a natural 
setting for developing both the theory and applications of wavelets. Some 
background in that area is helpful but can be picked up as needed [link]. 
The study and understanding of wavelets is greatly assisted by using some 
sort of wavelet software system to work out examples and run experiments. 
Matlab?™ programs are included at the end of this book and on our web 
site (noted at the end of the preface). Several other systems are mentioned 
in Chapter: Wavelet-Based Signal Processing and Applications . 


There are several different approaches that one could take in presenting 
wavelet theory. We have chosen to start with the representation of a signal 
or function of continuous time in a series expansion, much as a Fourier 
series is used in a Fourier analysis. From this series representation, we can 
move to the expansion of a function of a discrete variable (e.g., samples of a 
signal) and the theory of filter banks to efficiently calculate and interpret the 
expansion coefficients. This would be analogous to the discrete Fourier 
transform (DFT) and its efficient implementation, the fast Fourier transform 
(FFT). We can also go from the series expansion to an integral transform 
called the continuous wavelet transform, which is analogous to the Fourier 
transform or Fourier integral. We feel starting with the series expansion 
gives the greatest insight and provides ease in seeing both the similarities 
and differences with Fourier analysis. 


This book is organized into sections and chapters, each somewhat self- 
contained. The earlier chapters give a fairly complete development of the 
discrete wavelet transform (DWT) as a series expansion of signals in terms 
of wavelets and scaling functions. The later chapters are short descriptions 
of generalizations of the DWT and of applications. They give references to 


other works, and serve as a sort of annotated bibliography. Because we 
intend this book as an introduction to wavelets which already have an 
extensive literature, we have included a rather long bibliography. However, 
it will soon be incomplete because of the large number of papers that are 
currently being published. Nevertheless, a guide to the other literature is 
essential to our goal of an introduction. 


A good sketch of the philosophy of wavelet analysis and the history of its 
development can be found in a book published by the National Academy of 
Science in the chapter by Barbara Burke [link]. She has written an excellent 
expanded version in [link], which should be read by anyone interested in 
wavelets. Daubechies gives a brief history of the early research in [link]. 


Many of the results and relationships presented in this book are in the form 
of theorems and proofs or derivations. A real effort has been made to ensure 
the correctness of the statements of theorems but the proofs are often only 
outlines of derivations intended to give insight into the result rather than to 
be a formal proof. Indeed, many of the derivations are put in the Appendix 
in order not to clutter the presentation. We hope this style will help the 
reader gain insight into this very interesting but sometimes obscure new 
mathematical signal processing tool. 


We use a notation that is a mixture of that used in the signal processing 
literature and that in the mathematical literature. We hope this will make the 
ideas and results more accessible, but some uniformity and cleanness is lost. 


The authors acknowledge AFOSR, ARPA, NSF, Nortel, Inc., Texas 
Instruments, Inc. and Aware, Inc. for their support of this work. We 
specifically thank H. L. Resnikoff, who first introduced us to wavelets and 
who proved remarkably accurate in predicting their power and success. We 
also thank W. M. Lawton, R. O. Wells, Jr., R. G. Baraniuk, J. E. Odegard, I. 
W. Selesnick, M. Lang, J. Tian, and members of the Rice Computational 
Mathematics Laboratory for many of the ideas and results presented in this 
book. The first named author would like to thank the Maxfield and Oshman 
families for their generous support. The students in EE-531 and EE-696 at 
Rice University provided valuable feedback as did Bruce Francis, Strela 
Vasily, Hans Schiissler, Peter Steffen, Gary Sitton, Jim Lewis, Yves Angel, 


Curt Michel, J. H. Husoy, Kjersti Engan, Ken Castleman, Jeff Trinkle, 
Katherine Jones, and other colleagues at Rice and elsewhere. 


We also particularly want to thank Tom Robbins and his colleagues at 
Prentice Hall for their support and help. Their reviewers added significantly 
to the book. 


We would appreciate learning of any errors or misleading statements that 
any readers discover. Indeed, any suggestions for improvement of the book 
would be most welcome. Send suggestions or comments via email to 
csb@rice.edu. Software, articles, errata for this book, and other information 
on the wavelet research at Rice can be found on the world-wide-web URL: 
http: //dsp.rice.edu/ with links to other sites where wavelet research is being 
done. 


C. Sidney Burrus, Ramesh A. Gopinath, and Haitao Guo 


Houston, Texas; Yorktown Heights, New York; and Cuppertino, California 


Instructions to the Reader 


Although this book in arranged in a somewhat progressive order, starting 
with basic ideas and definitions, moving to a rather complete discussion of 
the basic wavelet system, and then on to generalizations, one should skip 
around when reading or studying from it. Depending on the background of 
the reader, he or she should skim over most of the book first, then go back 
and study parts in detail. The Introduction at the beginning and the 
Summary at the end should be continually consulted to gain or keep a 
perspective; similarly for the Table of Contents and Index. The Matlab 
programs in the Appendix or the Wavelet Toolbox from Mathworks or other 
wavelet software should be used for continual experimentation. The list of 
references should be used to find proofs or detail not included here or to 
pursue research topics or applications. The theory and application of 
wavelets are still developing and in a state of rapid growth. We hope this 
book will help open the door to this fascinating new subject. 


OpenStax-Connexions Edition 


We thank Pearson, Inc. for permission (given in 2012) to put this content 
(originally published in 1998 with Prentice Hall) into the OpenStax Cnx 
system online under the Creative Commons attribution only (cc-by) 
copyright license. We also thank Daniel Williamson at OpenStax for his 
contributions. This edition has some minor errors corrected and some more 
recent references added. In particular, Stéphane Mallat latest book, a 
Wavelet Tour of Signal Processing [link] also available in OpenStax at 
https://legacy.cnx.org/content/col10711/latest/ and Kovaéevic, Goyal, and 
Vetterli's new book, Fourier and Wavelet Signal Processing [link] online at 
http://www.fourierandwavelets.org/ A valuable collection of basic papers 
has been published [link] and a book on Frames [link]. 


If one starts with Louis Scharf's book, A First Course in Electrical and 
Computer Engineering , which is in OpenStax at 
https://legacy.cnx.org/content/col10685/latest/ followed by Richard 
Baraniuk's book, Signals and Systems, at 
https://legacy.cnx.org/content/col10064/latest/ and Martin Vetterli et al 
book, Foundations of Signal Processing at 

http://www. fourierandwavelets.org/ one has an excellent set of signal 
processing resources, all online. 


Introduction to Wavelets 


This chapter will provide an overview of the topics to be developed in the 
book. Its purpose is to present the ideas, goals, and outline of properties for 
an understanding of and ability to use wavelets and wavelet transforms. The 
details and more careful definitions are given later in the book. 


A wave is usually defined as an oscillating function of time or space, such 
as a sinusoid. Fourier analysis is wave analysis. It expands signals or 
functions in terms of sinusoids (or, equivalently, complex exponentials) 
which has proven to be extremely valuable in mathematics, science, and 
engineering, especially for periodic, time-invariant, or stationary 
phenomena. A wavelet is a “small wave", which has its energy concentrated 
in time to give a tool for the analysis of transient, nonstationary, or time- 
varying phenomena. It still has the oscillating wave-like characteristic but 
also has the ability to allow simultaneous time and frequency analysis with 
a flexible mathematical foundation. This is illustrated in [link] with the 
wave (sinusoid) oscillating with equal amplitude over —oco < t < oo and, 
therefore, having infinite energy and with the wavelet in [link] having its 
finite energy concentrated around a point in time. 


A Wave and a Wavelet: A Sine Wave 


A Wave and a Wavelet: 
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We will take wavelets and use them in a series expansion of signals or 
functions much the same way a Fourier series uses the wave or sinusoid to 
represent a signal or function. The signals are functions of a continuous 
variable, which often represents time or distance. From this series 
expansion, we will develop a discrete-time version similar to the discrete 
Fourier transform where the signal is represented by a string of numbers 
where the numbers may be samples of a signal, samples of another string of 
numbers, or inner products of a signal with some expansion set. Finally, we 
will briefly describe the continuous wavelet transform where both the signal 
and the transform are functions of continuous variables. This is analogous 
to the Fourier transform. 


Wavelets and Wavelet Expansion Systems 
Before delving into the details of wavelets and their properties, we need to 


get some idea of their general characteristics and what we are going to do 
with them [link]. 


What is a Wavelet Expansion or a Wavelet Transform? 


A signal or function f(t) can often be better analyzed, described, or 
processed if expressed as a linear decomposition by 
Equation: 


f(t) = So acdx(t) 
7 


where £ is an integer index for the finite or infinite sum, ay are the real- 
valued expansion coefficients, and wz () are a set of real-valued functions 
of ¢ called the expansion set. If the expansion [link] is unique, the set is 
called a basis for the class of functions that can be so expressed. If the basis 
is orthonormal, meaning 

Equation: 


(by (),ve(t)) = / why (t) by (t) dt = 0 kee, 


then the coefficients can be calculated by the inner product 
Equation: 


ax = (f(t), de(#)) = / f(t) dp (t) dt. 


One can see that substituting [link] into [link] and using [link] gives the 
single a, coefficient. If the basis set is not orthogonal, then a dual basis set 
w x (t) exists such that using [link] with the dual basis gives the desired 
coefficients. This will be developed in Chapter: A multiresolution 
formulation of Wavelet Systems. 


For a Fourier series, the orthogonal basis functions 7, (t) are sin (kwot) 
and cos (kwot) with frequencies of kw. For a Taylor's series, the 
nonorthogonal basis functions are simple monomials t*, and for many other 
expansions they are various polynomials. There are expansions that use 
splines and even fractals. 


For the wavelet expansion, a two-parameter system is constructed such that 
[link] becomes 
Equation: 


f= SoS aja dis (t) 
kj 


where both j and k are integer indices and the 7, ;, (t) are the wavelet 
expansion functions that usually form an orthogonal basis. 


The set of expansion coefficients a; are called the discrete wavelet 
transform (DWT) of f(t) and [link] is the inverse transform. 


What is a Wavelet System? 


The wavelet expansion set is not unique. There are many different wavelets 
systems that can be used effectively, but all seem to have the following 
three general characteristics [link]. 


1. A wavelet system is a set of building blocks to construct or represent a 
signal or function. It is a two-dimensional expansion set (usually a 
basis) for some class of one- (or higher) dimensional signals. In other 
words, if the wavelet set is given by w, x (t) for indices of 
j,k = 1,2,---, a linear expansion would be 
f(t) = di, do; Gk Vix (€) for some set of coefficients a;,,,. 

2. The wavelet expansion gives a time-frequency localization of the 
signal. This means most of the energy of the signal is well represented 
by a few expansion coefficients, @; x. 

3. The calculation of the coefficients from the signal can be done 
efficiently. It turns out that many wavelet transforms (the set of 
expansion coefficients) can be calculated with O(V) operations. This 
means the number of floating-point multiplications and additions 
increase linearly with the length of the signal. More general wavelet 


transforms require O(N log (V)) operations, the same as for the fast 
Fourier transform (FFT) [link]. 


Virtually all wavelet systems have these very general characteristics. Where 
the Fourier series maps a one-dimensional function of a continuous variable 
into a one-dimensional sequence of coefficients, the wavelet expansion 
maps it into a two-dimensional array of coefficients. We will see that it is 
this two-dimensional representation that allows localizing the signal in both 
time and frequency. A Fourier series expansion localizes in frequency in 
that if a Fourier series expansion of a signal has only one large coefficient, 
then the signal is essentially a single sinusoid at the frequency determined 
by the index of the coefficient. The simple time-domain representation of 
the signal itself gives the localization in time. If the signal is a simple pulse, 
the location of that pulse is the localization in time. A wavelet 
representation will give the location in both time and frequency 
simultaneously. Indeed, a wavelet representation is much like a musical 
score where the location of the notes tells when the tones occur and what 
their frequencies are. 


More Specific Characteristics of Wavelet Systems 


There are three additional characteristics [link], [link] that are more specific 
to wavelet expansions. 


1. All so-called first-generation wavelet systems are generated from a 
single scaling function or wavelet by simple scaling and translation. 
The two-dimensional parameterization is achieved from the function 
(sometimes called the generating wavelet or mother wavelet) w(t) by 
Equation: 


Din (t) = 25/7 p (2% — k) j,keEZ 


where Z is the set of all integers and the factor 2/ /2 maintains a 
constant norm independent of scale 7. This parameterization of the 
time or space location by k& and the frequency or scale (actually the 
logarithm of scale) by 7 turns out to be extraordinarily effective. 


2. Almost all useful wavelet systems also satisfy the multiresolution 
conditions. This means that if a set of signals can be represented by a 
weighted sum of #(¢ — k), then a larger set (including the original) can 
be represented by a weighted sum of #(2t — k). In other words, if the 
basic expansion signals are made half as wide and translated in steps 
half as wide, they will represent a larger class of signals exactly or give 
a better approximation of any signal. 

3. The lower resolution coefficients can be calculated from the higher 
resolution coefficients by a tree-structured algorithm called a filter 
bank. This allows a very efficient calculation of the expansion 
coefficients (also known as the discrete wavelet transform) and relates 
wavelet transforms to an older area in digital signal processing. 


The operations of translation and scaling seem to be basic to many practical 
signals and signal-generating processes, and their use is one of the reasons 
that wavelets are efficient expansion functions. [link] is a pictorial 
representation of the translation and scaling of a single mother wavelet 
described in [link]. As the index k changes, the location of the wavelet 
moves along the horizontal axis. This allows the expansion to explicitly 
represent the location of events in time or space. As the index 7 changes, the 
shape of the wavelet changes in scale. This allows a representation of detail 
or resolution. Note that as the scale becomes finer (7 larger), the steps in 
time become smaller. It is both the narrower wavelet and the smaller steps 
that allow representation of greater detail or higher resolution. For clarity, 
only every fourth term in the translation (k = 1, 5,9, 13,---) is shown, 
otherwise, the figure is a clutter. What is not illustrated here but is important 
is that the shape of the basic mother wavelet can also be changed. That is 
done during the design of the wavelet system and allows one set to well- 
represent a particular class of signals. 


For the Fourier series and transform and for most signal expansion systems, 
the expansion functions (bases) are chosen, then the properties of the 
resulting transform are derived and 


Translation (every fourth k) and Scaling of a Wavelet wp4 


analyzed. For the wavelet system, the desired properties or characteristics 
are mathematically required, then the resulting basis functions are derived. 
Because these constraints do not use all the degrees of freedom, other 
properties can be required to customize the wavelet system for a particular 
application. Once you decide on a Fourier series, the sinusoidal basis 
functions are completely set. That is not true for the wavelet. There are an 
infinity of very different wavelets that all satisfy the above properties. 
Indeed, the understanding and design of the wavelets is an important topic 
of this book. 


Wavelet analysis is well-suited to transient signals. Fourier analysis is 
appropriate for periodic signals or for signals whose statistical 
characteristics do not change with time. It is the localizing property of 
wavelets that allow a wavelet expansion of a transient event to be modeled 
with a small number of coefficients. This turns out to be very useful in 
applications. 


Haar Scaling Functions and Wavelets 


The multiresolution formulation needs two closely related basic functions. 
In addition to the wavelet (t) that has been discussed (but not actually 
defined yet), we will need another basic function called the scaling function 
o(t). The reasons for needing this function and the details of the relations 
will be developed in the next chapter, but here we will simply use it in the 
wavelet expansion. 


The simplest possible orthogonal wavelet system is generated from the 
Haar scaling function and wavelet. These are shown in [link]. Using a 
combination of these scaling functions and wavelets allows a large class of 
signals to be represented by 

Equation: 


f@) = S- crp(t—k) + 3 S dia (2! — fh). 
k=—0o 


k=—oo j7=0 


Haar [link] showed this result in 1910, and we now know that wavelets are 
a generalization of his work. An example of a Haar system and expansion is 
given at the end of Chapter: A multiresolution formulation of Wavelet 
systems. 


What do Wavelets Look Like? 


All Fourier basis functions look alike. A high-frequency sine wave looks 
like a compressed low-frequency sine wave. A cosine wave is a sine wave 
translated by 90° or 7/2 radians. It takes a 


(a) ot) (b) w(t) 


Haar Scaling Function and Wavelet 


large number of Fourier components to represent a discontinuity or a sharp 
comer. In contrast, there are many different wavelets and some have sharp 
corners themselves. 


To appreciate the special character of wavelets you should recognize that it 
was not until the late 1980's that some of the most useful basic wavelets 

were ever seen. [link] illustrates four different scaling functions, each being 
zero outside of 0 < ¢ < 6 and each generating an orthogonal wavelet basis 


for all square integrable functions. This figure is also shown on the cover to 
this book. 


Several more scaling functions and their associated wavelets are illustrated 
in later chapters, and the Haar wavelet is shown in [link] and in detail at the 
end of Chapter: A multiresolution formulation of Wavelet Systems. 


(c) a= Hn, B=—yt (d) a= 3%,8= 30 


Example Scaling Functions (See Section: Further Properties 
of the Scaling Function and Wavelet for the meaning of a 
and B) 


Why is Wavelet Analysis Effective? 


Wavelet expansions and wavelet transforms have proven to be very efficient 
and effective in analyzing a very wide class of signals and phenomena. 
Why is this true? What are the properties that give this effectiveness? 


1. The size of the wavelet expansion coefficients a; in [link] or dj, in 
[link] drop off rapidly with 7 and k for a large class of signals. This 
property is called being an unconditional basis and it is why wavelets 
are so effective in signal and image compression, denoising, and 


detection. Donoho [link], [link] showed that wavelets are near optimal 
for a wide class of signals for compression, denoising, and detection. 

2. The wavelet expansion allows a more accurate local description and 
separation of signal characteristics. A Fourier coefficient represents a 
component that lasts for all time and, therefore, temporary events must 
be described by a phase characteristic that allows cancellation or 
reinforcement over large time periods. A wavelet expansion coefficient 
represents a component that is itself local and is easier to interpret. The 
wavelet expansion may allow a separation of components of a signal 
whose Fourier description overlap in both time and frequency. 

3. Wavelets are adjustable and adaptable. Because there is not just one 
wavelet, they can be designed to fit individual applications. They are 
ideal for adaptive systems that adjust themselves to suit the signal. 

4. The generation of wavelets and the calculation of the discrete wavelet 
transform is well matched to the digital computer. We will later see 
that the defining equation for a wavelet uses no calculus. There are no 
derivatives or integrals, just multiplications and additions—operations 
that are basic to a digital computer. 


While some of these details may not be clear at this point, they should point 
to the issues that are important to both theory and application and give 
reasons for the detailed development that follows in this and other books. 


The Discrete Wavelet Transform 


This two-variable set of basis functions is used in a way similar to the short- 
time Fourier transform, the Gabor transform, or the Wigner distribution for 
time-frequency analysis [link], [link], [link]. Our goal is to generate a set of 
expansion functions such that any signal in L? (R) (the space of square 
integrable functions) can be represented by the series 

Equation: 


f (t) = So azn 27”? p (25t — k) 
jk 


or, using [link], as 


Equation: 


f(t) = So aja Wie (t) 
Uk 


where the two-dimensional set of coefficients a; is called the discrete 
wavelet transform (DWT) of f(t). A more specific form indicating how the 
a; ,'s are calculated can be written using inner products as 

Equation: 


FO) =S> Wiel), FO) dye Ct) 


j,k 


if the 7, ;, (¢) form an orthonormal basis[ footnote] for the space of signals 
of interest [link]. The inner product is usually defined as 

Bases and tight frames are defined in Chapter: Bases, Orthogonal Bases, 
Biorthogonal Bases, Frames, Right Frames, and unconditional Bases. 
Equation: 


The goal of most expansions of a function or signal is to have the 
coefficients of the expansion a; , give more useful information about the 
signal than is directly obvious from the signal itself. A second goal is to 
have most of the coefficients be zero or very small. This is what is called a 
sparse representation and is extremely important in applications for 
Statistical estimation and detection, data compression, nonlinear noise 
reduction, and fast algorithms. 


Although this expansion is called the discrete wavelet transform (DWT), it 
probably should be called a wavelet series since it is a series expansion 
which maps a function of a continuous variable into a sequence of 
coefficients much the same way the Fourier series does. However, that is 
not the convention. 


This wavelet series expansion is in terms of two indices, the time translation 
k and the scaling index 7. For the Fourier series, there are only two possible 
values of k, zero and 7/2, which give the sine terms and the cosine terms. 
The values 7 give the frequency harmonics. In other words, the Fourier 
series is also a two-dimensional expansion, but that is not seen in the 
exponential form and generally not noticed in the trigonometric form. 


The DWT of a signal is somewhat difficult to illustrate because it is a 
function of two variables or indices, but we will show the DWT of a simple 
pulse in [link] to illustrate the localization of the transform. Other displays 
will be developed in the next chapter. 


da(k) | | 


Discrete Wavelet Transform of a Pulse, using ~ pg with a Gain of 4/2 
for Each Higher Scale. 


The Discrete-Time and Continuous Wavelet Transforms 


If the signal is itself a sequence of numbers, perhaps samples of some 
function of a continuous variable or perhaps a set of inner products, the 
expansion of that signal is called a discrete-time wavelet transform 
(DTWT). It maps a sequence of numbers into a sequence of numbers much 
the same way the discrete Fourier transform (DFT) does. It does not, 
however, require the signal to be finite in duration or periodic as the DFT 
does. To be consistent with Fourier terminology, it probably should be 
called the discrete-time wavelet series, but this is not the convention. If the 
discrete-time signal is finite in length, the transform can be represented by a 
finite matrix. This formulation of a series expansion of a discrete-time 
signal is what filter bank methods accomplish [link], [link] and is developed 
in Chapter: Filter Banks and Transmultiplexers of this book. 


If the signal is a function of a continuous variable and a transform that is a 
function of two continuous variables is desired, the continuous wavelet 
transform (CWT) can be defined by 


Equation: 
F(a,b) = [row (=) dt 


with an inverse transform of 


Equation: 
Oe [[Fonw (=) daa 


where w(t) is the basic wavelet and a, b € R are real continuous variables. 
Admissibility conditions for the wavelet w(t) to support this invertible 
transform is discussed by Daubechies [link], Heil and Walnut [link], and 
others and is briefly developed in Section: Discrete Multiresolution 


Analysis, the Discrete-Time Wavelet of this book. It is analogous to the 
Fourier transform or Fourier integral. 


Exercises and Experiments 


As the ideas about wavelets and wavelet transforms are developed in this 
book, it will be very helpful to experiment using the Matlab programs in the 
appendix of this book or in the Matlab Toolbox [link]. An effort has been 
made to use the same notation in the programs in Appendix C as is used in 
the formulas in the book so that going over the programs can help in 
understanding the theory and vice versa. 


This Chapter 


This chapter has tried to set the stage for a careful introduction to both the 
theory and use of wavelets and wavelet transforms. We have presented the 
most basic characteristics of wavelets and tried to give a feeling of how and 
why they work in order to motivate and give direction and structure to the 
following material. 


The next chapter will present the idea of multiresolution, out of which will 
develop the scaling function as well as the wavelet. This is followed by a 
discussion of how to calculate the wavelet expansion coefficients using 
filter banks from digital signal processing. Next, a more detailed 
development of the theory and properties of scaling functions, wavelets, 
and wavelet transforms is given followed by a chapter on the design of 
wavelet systems. Chapter: Filter Banks and Transmultiplexers gives a 
detailed development of wavelet theory in terms of filter banks. 


The earlier part of the book carefully develops the basic wavelet system and 
the later part develops several important generalizations, but in a less 
detailed form. 


A multiresolution formulation of Wavelet Systems 


Both the mathematics and the practical interpretations of wavelets seem to 
be best served by using the concept of resolution [link], [link], [link], [link] 
to define the effects of changing scale. To do this, we will start with a 
scaling function y(t) rather than directly with the wavelet (t). After the 
scaling function is defined from the concept of resolution, the wavelet 
functions will be derived from it. This chapter will give a rather intuitive 
development of these ideas, which will be followed by more rigorous 
arguments in Chapter: The Scaling Function and Scaling Coefficients, 
Wavelet and Wavelet Coefficients. 


This multiresolution formulation is obviously designed to represent signals 
where a single event is decomposed into finer and finer detail, but it turns 
out also to be valuable in representing signals where a time-frequency or 
time-scale description is desired even if no concept of resolution is needed. 
However, there are other cases where multiresolution is not appropriate, 
such as for the short-time Fourier transform or Gabor transform or for local 
sine or cosine bases or lapped orthogonal transforms, which are all 
discussed briefly later in this book. 


Signal Spaces 


In order to talk about the collection of functions or signals that can be 
represented by a sum of scaling functions and/or wavelets, we need some 
ideas and terminology from functional analysis. If these concepts are not 
familiar to you or the information in this section is not sufficient, you may 
want to skip ahead and read Chapter: The Scaling Function and Scaling 
Coefficients, Wavelet and Wavelet Coefficients or [link]. 


A function space is a linear vector space (finite or infinite dimensional) 
where the vectors are functions, the scalars are real numbers (sometime 
complex numbers), and scalar multiplication and vector addition are similar 
to that done in [link]. The inner product is a scalar a obtained from two 
vectors, f(t) and g(t), by an integral. It is denoted 

Equation: 


a = (f(t),g(#)) = / i (t)g(t)at 


with the range of integration depending on the signal class being 
considered. This inner product defines a norm or “length" of a vector which 
is denoted and defined by 

Equation: 


lfl= JUBA 


which is a simple generalization of the geometric operations and definitions 
in three-dimensional Euclidean space. Two signals (vectors) with non-zero 
norms are called orthogonal if their inner product is zero. For example, 
with the Fourier series, we see that sin (¢) is orthogonal to sin (2t). 


A space that is particularly important in signal processing is called L? (R). 
This is the space of all functions f(¢) with a well defined integral of the 
square of the modulus of the function. The “L" signifies a Lebesque 
integral, the “2" denotes the integral of the square of the modulus of the 
function, and R states that the independent variable of integration t is a 
number over the whole real line. For a function g(t) to be a member of that 
space is denoted: g € L? (R) or simply g € i. 


Although most of the definitions and derivations are in terms of signals that 
are in L?, many of the results hold for larger classes of signals. For 
example, polynomials are not in L” but can be expanded over any finite 
domain by most wavelet systems. 


In order to develop the wavelet expansion described in [link], we will need 
the idea of an expansion set or a basis set. If we start with the vector space 
of signals, S, then if any f(t) € S can be expressed as 

f (t) = do, ak vx (t), the set of functions y; (t) are called an expansion 
set for the space S. If the representation is unique, the set is a basis. 
Alternatively, one could start with the expansion set or basis set and define 
the space S as the set of all functions that can be expressed by 


f (t) = 95), ak vx (t). This is called the span of the basis set. In several 
cases, the signal spaces that we will need are actually the closure of the 
Space spanned by the basis set. That means the space contains not only all 
signals that can be expressed by a linear combination of the basis functions, 
but also the signals which are the limit of these infinite expansions. The 
closure of a space is usually denoted by an over-line. 


The Scaling Function 


In order to use the idea of multiresolution, we will start by defining the 
scaling function and then define the wavelet in terms of it. As described for 
the wavelet in the previous chapter, we define a set of scaling functions in 
terms of integer translates of the basic scaling function by 

Equation: 


yr(t) = y(t—k) keEZ ype L’, 


The subspace of L? (R) spanned by these functions is defined as 
Equation: 


Vo = Span, {yx (t)} 


for all integers k from minus infinity to infinity. The over-bar denotes 
closure. This means that 
Equation: 


f(t) = Darga (t) for any f(t) Vo, 
k 


One can generally increase the size of the subspace spanned by changing 
the time scale of the scaling functions. A two-dimensional family of 
functions is generated from the basic scaling function by scaling and 
translation by 

Equation: 


pin (t) = 25! w (25t — k) 


whose span over k is 
Equation: 


Vj = Span, {yx (2’t)} = Span, {3,4 (t)} 


for all integers k € Z. This means that if f (t) € Vj, then it can be 
expressed as 
Equation: 


f(t) = Yo any (2+). 
k 


For j > 0, the span can be larger since yx (t) is narrower and is translated 
in smaller steps. It, therefore, can represent finer detail. For 7 < 0, p;% (t) 
is wider and is translated in larger steps. So these wider scaling functions 
can represent only coarse information, and the space they span is smaller. 
Another way to think about the effects of a change of scale is in terms of 
resolution. If one talks about photographic or optical resolution, then this 
idea of scale is the same as resolving power. 


Multiresolution Analysis 


In order to agree with our intuitive ideas of scale or resolution, we 
formulate the basic requirement of multiresolution analysis (MRA) [link] 
by requiring a nesting of the spanned spaces as 

Equation: 


ee Vio CV4-CVpCVi.c ic e-CL 


or 


Equation: 


V,CVj41 forall geez 


with 
Equation: 


— {O}, Voo = L’. 


The space that contains high resolution signals will contain those of lower 
resolution also. 


Because of the definition of V;, the spaces have to satisfy a natural scaling 
condition 
Equation: 


FEV @ fat) eVju1 


which insures elements in a space are simply scaled versions of the 
elements in the next space. The relationship of the spanned spaces is 
illustrated in [link]. 


The nesting of the spans of yp (2/ t — k), denoted by V; and shown in [link] 
and [link] and graphically illustrated in [link], is achieved by requiring that 
y (t) € Vi, which means that if y(t) is in Vo, it is also in Vi, the space 
spanned by y(2t). This means y(t) can be expressed in terms of a 
weighted sum of shifted w(2t) as 

Equation: 


= Alm) )V2y(2t—n), neZ 


V3 DV¥2DV1 DVo 


Nested Vector Spaces Spanned by the Scaling 
Functions 


where the coefficients h(n) are a sequence of real or perhaps complex 
numbers called the scaling function coefficients (or the scaling filter or the 
scaling vector) and the / 2 maintains the norm of the scaling function with 
the scale of two. 


This recursive equation is fundamental to the theory of the scaling functions 
and is, in some ways, analogous to a differential equation with coefficients 
h(n) and solution y(t) that may or may not exist or be unique. The 
equation is referred to by different names to describe different 
interpretations or points of view. It is called the refinement equation, the 
multiresolution analysis (MRA) equation, or the dilation equation. 


The Haar scaling function is the simple unit-width, unit-height pulse 
function y(t) shown in [link], and it is obvious that y(2¢) can be used to 
construct y(t) by 

Equation: 


p(t) = p(2t) + y(2t — 1) 


which means [link] is satisfied for coefficients 


h(0)=1/V2, h(1) =1/V2. 


The triangle scaling ranction (also a first order spune) in [link] satisfies 
[link] for h (0) = rt h()= a h(2)= aE , and the Daubechies 
scaling function shown in the first part of 


o(t) = (2t) + (2t — 1) 
b(t) = 4@(2t) + o(2t — 1) + 


o(2t — 2) 


Eg 


Haar and Triangle Scaling Functions 


Figure: Daubechies Scaling Functions satisfies [link] for 

h = {0. 483, 0. 8365, 0.2241, —0. 1294} as do all scaling functions for 
their corresponding scaling coefficients. Indeed, the design of wavelet 
systems is the choosing of the coefficients h(n) and that is developed later. 


The Wavelet Functions 


The important features of a signal can better be described or parameterized, 
not by using y,;, (t) and increasing j to increase the size of the subspace 
spanned by the scaling functions, but by defining a slightly different set of 
functions wx (t) that span the differences between the spaces spanned by 
the various scales of the scaling function. These functions are the wavelets 
discussed in the introduction of this book. 


There are several advantages to requiring that the scaling functions and 
wavelets be orthogonal. Orthogonal basis functions allow simple 
calculation of expansion coefficients and have a Parseval's theorem that 
allows a partitioning of the signal energy in the wavelet transform domain. 
The orthogonal complement of V; in V;,1 is defined as W;. This means 
that all members of V; are orthogonal to all members of W;. We require 
Equation: 


(pik (t), Pye (t)) = i (t) wje(t) dt = 0 


for all appropriate 7,k, 2 € Z. 


The relationship of the various subspaces can be seen from the following 
expressions. From [link] we see that we may start at any V;, say at 7 = 0, 
and write 

Equation: 


Ve CViCG Cex EL, 


We now define the wavelet spanned subspace Wo such that 
Equation: 


Vi = Vo BWo 


which extends to 
Equation: 


Vo = Vo BW @ Wj. 


In general this gives 
Equation: 


LP =YWewowe::: 


when Vo is the initial space spanned by the scaling function ¢(t — é). [link] 
pictorially shows the nesting of the scaling function spaces V; for different 
scales 7 and how the wavelet spaces are the disjoint differences (except for 
the zero element) or, the orthogonal complements. 


The scale of the initial space is arbitrary and could be chosen at a higher 
resolution of, say, 7 = 10 to give 


Equation: 

L? =Vip PW OW @--- 
or at a lower resolution such as 7 = —5 to give 
Equation: 

L?=V50W50W46::: 


We LW, LWo LVo v3 D V2 DV DVo 


Scaling Function and Wavelet Vector Spaces 


or at even 7 = —oo where [link] becomes 
Equation: 


L?=---@W2O9W10WOW10W28::: 


eliminating the scaling space altogether and allowing an expansion of the 
form in [link]. 


Another way to describe the relation of Vp to the wavelet spaces is noting 
Equation: 


W_~. 8:::-@®@W_1=V 


which again shows that the scale of the scaling space can be chosen 
arbitrarily. In practice, it is usually chosen to represent the coarsest detail of 
interest in a signal. 


Since these wavelets reside in the space spanned by the next narrower 
scaling function, Wo C Vj, they can be represented by a weighted sum of 
shifted scaling function y(2t) defined in [link] by 

Equation: 


v(t) = S>hi(n) V2 y(2t-n), neZ 


for some set of coefficients h; (n). From the requirement that the wavelets 
span the “difference” or orthogonal complement spaces, and the 
orthogonality of integer translates of the wavelet (or scaling function), it is 
shown in the Appendix in [link] that the wavelet coefficients (modulo 
translations by integer multiples of two) are required by orthogonality to be 
related to the scaling function coefficients by 

Equation: 


hi (n) = (-1)"h(1— 72). 


One example for a finite even length-N h(n) could be 
Equation: 


hi (n) = (-1)"h(N —1—n). 


The function generated by [link] gives the prototype or mother wavelet 
w(t) for a class of expansion functions of the form 
Equation: 


Dix (t) = 27? w (2t — k) 


where 2/ is the scaling of t (7 is the log, of the scale), 2~k is the 
translation in t, and 24/2 maintains the (perhaps unity) L? norm of the 
wavelet at different scales. 


The Haar and triangle wavelets that are associated with the scaling 
functions in [link] are shown in [link]. For the Haar wavelet, the 
coefficients in [link] are hy (0) = 1/V2, hy (1) = —1/v2 which satisfy 
[link]. The Daubechies wavelets associated with the scaling functions in 
Figure: Daubechies Scaling Functions are shown in Figure: Daubechies 
Wavelets with corresponding coefficients given later in the book in Table: 
Scaling Function and Wavelet Coefficients plus their Discrete Moments for 
Daubechies-8 and ‘Table: Daubechies Scaling Function and Wavelet 
Coefficients plus their Moments. 


Haar (same as Wp>) Triangle (same as Wo) 
D2 8 S1 


WW 


u(t) = $(2t) — o(2t — 1) H(t) = —2o(2t) — 2o(2t — 2) + H(2t — 1) 


We have now constructed a set of functions y; (t) and W,,x (¢) that could 
span all of L? (R). According to [link], any function g (t) € L? (R) could 
be written 


Equation: 
s= Yo eWMaO+d) Yo dG )dnO 
k=—0o j=0 k=—0o 


as a series expansion in terms of the scaling function and wavelets. 


In this expansion, the first summation in [link] gives a function that is a low 
resolution or coarse approximation of g(t). For each increasing index j in 
the second summation, a higher or finer resolution function is added, which 
adds increasing detail. This is somewhat analogous to a Fourier series 
where the higher frequency terms contain the detail of the signal. 


Later in this book, we will develop the property of having these expansion 
functions form an orthonormal basis or a tight frame, which allows the 
coefficients to be calculated by inner products as 

Equation: 


and 


Equation: 


dj(k) =d(j,k) = (g(t), vin (t)) = [avs (t) dt. 


The coefficient d(j, k) is sometimes written as d, (k) to emphasize the 
difference between the time translation index k and the scale parameter j. 
The coefficient c(k) is also sometimes written as c; (k) or c(j, &) if a more 
general “starting scale" other than 7 = O for the lower limit on the sum in 
[link] is used. 


It is important at this point to recognize the relationship of the scaling 
function part of the expansion [link] to the wavelet part of the expansion. 
From the representation of the nested spaces in [link] we see that the scaling 
function can be defined at any scale 7. [link] uses 7 = 0 to denote the 
family of scaling functions. 


You may want to examine the Haar system example at the end of this 
chapter just now to see these features illustrated. 


The Discrete Wavelet Transform 


Since 
Equation: 


L? = V;, ®W;, BWj,41 B--- 
using [link] and [link], a more general statement of the expansion [link] can 


be given by 
Equation: 


g(t) = S- ej (k) 25? yp (2%t —k) + 5° 3 d, (k) 24/? ap (2%t — k) 
k 


k J=Jo 


or 
Equation: 


[o.¢) 


g(t) = D7 Cio (K) Prose (E) + DD) a (K) Wit (€) 
k 


k J=Jo 


where jg could be zero as in [link] and [link], it could be ten as in [link], or 
it could be negative infinity as in [link] and [link] where no scaling 
functions are used. The choice of 79 sets the coarsest scale whose space is 
spanned by ~j,,z (t). The rest of L? (R) is spanned by the wavelets which 
provide the high resolution details of the signal. In practice where one is 
given only the samples of a signal, not the signal itself, there is a highest 
resolution when the finest scale is the sample level. 


The coefficients in this wavelet expansion are called the discrete wavelet 
transform (DWT) of the signal g(t). If certain conditions described later are 
satisfied, these wavelet coefficients completely describe the original signal 
and can be used in a way similar to Fourier series coefficients for analysis, 
description, approximation, and filtering. If the wavelet system is 
orthogonal, these coefficients can be calculated by inner products 
Equation: 


and 
Equation: 


dj (k) = (g(t), Vix (t)) = [s@ wb; p (t) dt. 


If the scaling function is well-behaved, then at a high scale, the scaling is 
similar to a Dirac delta function and the inner product simply samples the 
function. In other words, at high enough resolution, samples of the signal 


are very close to the scaling coefficients. More is said about this later. It has 
been shown [link] that wavelet systems form an unconditional basis for a 
large class of signals. That is discussed in Chapter: The Scaling Function 
and Scaling Coefficients, Wavelet and Wavelet Coefficients but means that 
even for the worst case signal in the class, the wavelet expansion 
coefficients drop off rapidly as 7 and k increase. This is why the DWT is 
efficient for signal and image compression. 


The DWT is similar to a Fourier series but, in many ways, is much more 
flexible and informative. It can be made periodic like a Fourier series to 
represent periodic signals efficiently. However, unlike a Fourier series, it 
can be used directly on non-periodic transient signals with excellent results. 
An example of the DWT of a pulse was illustrated in Figure: Two-Stage 
Two-Band Analysis Tree. Other examples are illustrated just after the next 
section. 


A Parseval's Theorem 


If the scaling functions and wavelets form an orthonormal basis| footnote], 
there is a Parseval's theorem that relates the energy of the signal g(t) to the 
energy in each of the components and their wavelet coefficients. That is one 
reason why orthonormality is important. 

or a tight frame defined in Chapter: Bases, Orthogonal Bases, Biorthogonal 


For the general wavelet expansion of [link] or [link], Parseval's theorem is 
Equation: 


[\sw) w= S leor + ar 


with the energy in the expansion domain partitioned in time by k and in 
scale by 7. Indeed, it is this partitioning of the time-scale parameter plane 
that describes the DWT. If the expansion system is a tight frame, there is a 
constant multiplier in [link] caused by the redundancy. 


Daubechies [link], [link] showed that it is possible for the scaling function 
and the wavelets to have compact support (i.e., be nonzero only over a 
finite region) and to be orthonormal. This makes possible the time 
localization that we desire. We now have a framework for describing 
signals that has features of short-time Fourier analysis and of Gabor-based 
analysis but using a new variable, scale. For the short-time Fourier 
transform, orthogonality and good time-frequency resolution are 
incompatible according to the Balian-Low-Coifman-Semmes theorem 
[link], [link]. More precisely, if the short-time Fourier transform is 
orthogonal, either the time or the frequency resolution is poor and the trade- 
off is inflexible. This is not the case for the wavelet transform. Also, note 
that there is a variety of scaling functions and wavelets that can be obtained 
by choosing different coefficients h(n) in [link]. 


Donoho [link] has noted that wavelets are an unconditional basis for a very 
wide class of signals. This means wavelet expansions of signals have 
coefficients that drop off rapidly and therefore the signal can be efficiently 
represented by a small number of them. 


We have first developed the basic ideas of the discrete wavelet system using 
a scaling multiplier of 2 in the defining [link]. This is called a two-band 
wavelet system because of the two channels or bands in the related filter 
banks discussed in Chapter: Filter Banks and the Discrete Wavelet 
Transform and Chapter: Filter Banks and Transmultiplexers. It is also 
possible to define a more general discrete waveletsystem using 


y (t) = >, h(n) VM gy (Mt — n) where M is an integer [link]. This is 


Wavelets. The details of numerically calculating the DWT are discussed in 
Chapter: Calculation of the Discrete Wavelet Transform where special 
forms for periodic signals are used. 


Display of the Discrete Wavelet Transform and the Wavelet 
Expansion 


It is important to have an informative way of displaying or visualizing the 
wavelet expansion and transform. This is complicated in that the DWT is a 
real-valued function of two integer indices and, therefore, needs a two- 


dimensional display or plot. This problem is somewhat analogous to 
plotting the Fourier transform, which is a complex-valued function. 


There seem to be five displays that show the various characteristics of the 
DWT well: 


1. The most basic time-domain description of a signal is the signal itself 
(or, for most cases, samples of the signal) but it gives no frequency or 
scale information. A very interesting property of the DWT (and one 
different from the Fourier series) is for a high starting scale jo in 
[link], samples of the signal are the DWT at that scale. This is an 
extreme case, but it shows the flexibility of the DWT and will be 
explained later. 

2. The most basic wavelet-domain description is a three-dimensional plot 
of the expansion coefficients or DWT values c(k) and d; (k) over the 
4, k plane. This is difficult to do on a two-dimensional page or display 
screen, but we show a form of that in [link] and [link]. 

3. A very informative picture of the effects of scale can be shown by 
generating time functions f; (t) at each scale by summing [link] over k 


so that 
Equation: 
f(t) = ft DA 
j 
where 
Equation: 
fin = D7 e(k) p(t —k) 
k 
Equation: 


SR. 

“— 
oo 
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S| dj (k) 27” op (2%t — k). 


k 


This illustrates the components of the signal at each scale and is shown 
in [link] and [link]. 


4. Another illustration that shows the time localization of the wavelet 
expansion is obtained by generating time functions f; (¢) at each 
translation by summing [link] over k so that 
Equation: 


fF) = AW 
k 


where 
Equation: 


fr (t) = c(k) b(t —k) + S_ dj (k) 2” p (2't — bk). 


j 


This illustrates the components of the signal at each integer translation. 

5. There is another rather different display based on a partitioning of the 
time-scale plane as if the time translation index and scale index were 
continuous variables. This display is called “tiling the time-frequency 
plane." Because it is a different type of display and is developed and 
illustrated in Chapter: Calculation of the Discrete Wavelet Transform, 
it will not be illustrated here. 


Experimentation with these displays can be very informative in terms of the 
properties and capabilities of the wavelet transform, the effects of particular 
wavelet systems, and the way a wavelet expansion displays the various 
attributes or characteristics of a signal. 


Examples of Wavelet Expansions 


In this section, we will try to show the way a wavelet expansion 
decomposes a signal and what the components look like at different scales. 
These expansions use what is called a length-8 Daubechies basic wavelet 
(developed in Chapter: Regularity, Moments, and Wavelet System Design), 
but that is not the main point here. The local nature of the wavelet 
decomposition is the topic of this section. 


These examples are rather standard ones, some taken from David Donoho's 
papers and web page. The first is a decomposition of a piecewise linear 
function to show how edges and constants are handled. A characteristic of 
Daubechies systems is that low order polynomials are completely contained 
in the scaling function spaces V; and need no wavelets. This means that 
when a section of a signal is a section of a polynomial (such as a straight 
line), there are no wavelet expansion coefficients d; (k), but when the 
calculation of the expansion coefficients overlaps an edge, there is a 
wavelet component. This is illustrated well in [link] where the high 
resolution scales gives a very accurate location of the edges and this spreads 
out over k at the lower scales. This gives a hint of how the DWT could be 
used for edge detection and how the large number of small or zero 
expansion coefficients could be used for compression. 
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Discrete Wavelet Transform of the Houston Skyline, using w= pg’ with a 
Gain of 2 for Each Higher Scale 


[link] shows the approximations of the skyline signal in the various scaling 
function spaces V;. This illustrates just how the approximations progress, 
giving more and more resolution at higher scales. The fact that the higher 
scales give more detail is similar to Fourier methods, but the localization is 
new. [Link] illustrates the individual wavelet decomposition by showing the 
components of the signal that exist in the wavelet spaces W;; at different 
scales 7. This shows the same expansion as [link], but with the wavelet 
components given separately rather than being cumulatively added to the 
scaling function. Notice how the large objects show up at the lower 
resolution. Groups of buildings and individual buildings are resolved 
according to their width. The edges, however, are located at the higher 
resolutions and are located very accurately. 
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Projection of the Houston Skyline Signal onto V Spaces using 
Ppg 
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(b) Projection onto Wo 
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(d) Projection onto W2 
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(g) Projection onto W5; (h) Projection onto Weg 


Projection of the Houston Skyline Signal onto W Spaces using 


Ups’ 


The second example uses a chirp or doppler signal to illustrate how a time- 
varying frequency is described by the scale decomposition. [link] gives the 
coefficients of the DWT directly as a function of 7 and k. Notice how the 
location in & tracks the frequencies in the signal in a way the Fourier 
transform cannot. [link] and [link] show the scaling function 
approximations and the wavelet decomposition of this chirp signal. Again, 
notice in this type of display how the “location" of the frequencies are 
shown. 
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Discrete Wavelet Transform of a Doppler, using ~pg with a gain of 
/2 for each higher scale. 


An Example of the Haar Wavelet System 


In this section, we can illustrate our mathematical discussion with a more 
complete example. In 1910, Haar [link] showed that certain square wave 
functions could be translated and scaled to create a basis set that spans L?. 
This is illustrated in [link]. Years later, it was seen that Haar's system is a 
particular wavelet system. 


If we choose our scaling function to have compact support over 0 < ¢ < 1, 
then a solution to [link] is a scaling function that is a simple rectangle 
function Haar showed that as 7 —+ oo, V; — L?. We have an approximation 
made up of step functions approaching any square integrable function. 
Equation: 


‘4 if0<t<1l 


t) = 
oe 0 otherwise 


with only two nonzero coefficients h (0) = h (1) = 1/+/2 and [link] and 
[link] require the wavelet to be 
Equation: 


1 for0O<t<0.5 
y(t)= -1 for0.5<t<1 
0 otherwise 


with only two nonzero coefficients h; (0) = 1/2 and hy (1) = —1/V2. 


Vo is the space spanned by y(t — k) which is the space of piecewise 
constant functions over integers, a rather limited space, but nontrivial. The 
next higher resolution space V; is spanned by y(2t — k) which allows a 
somewhat more interesting class of signals which does include Vo. As we 
consider higher values of scale 7, the space V; spanned by y (24 k) 


becomes better able to approximate arbitrary functions or signals by finer 
and finer piecewise constant functions. 
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Projection of the Doppler Signal onto V Spaces using ®pg: 
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(a) Projection onto Vo 
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(h) Projection onto We 


Projection of the Doppler Signal onto W Spaces using Wpg’ 


The Haar functions are illustrated in [link] where the first column contains 
the simple constant basis function that spans Vo, the second column 
contains the unit pulse of width one half and the one translate necessary to 
span V;. The third column contains four translations of a pulse of width one 
fourth and the fourth contains eight translations of a pulse of width one 
eighth. This shows clearly how increasing the scale allows greater and 
greater detail to be realized. However, using only the scaling function does 
not allow the decomposition described in the introduction. For that we need 
the wavelet. Rather than use the scaling functions y(8t — k) in V3, we will 
use the orthogonal decomposition 


Equation: 
V3 = VOW 
which is the same as 
Equation: 
Span {y(8t—k)} = Span {py (4t — k)} @ Span {7 (4t — k)} 
k k k 


which means there are two sets of orthogonal basis functions that span V3, 
one in terms of 7 = 3 scaling functions, and the other in terms of half as 
many coarser 7 = 2 scaling functions plus the details contained in the 7 = 2 
wavelets. This is illustrated in [link]. 


Vo Vi V2 V3 


Haar Scaling Functions and Wavelets that Span V; 


The V2 can be further decomposed into 
Equation: 


Y= View, 


which is the same as 
Equation: 


open {p(4t—k)} = epee {yp (2t—k)} ® epee {w (2t — k)} 


V3 o(8t —k) EOE nee ae Eee eee eee 


w(4t—k) We 


Haar Scaling Functions and Wavelets 
Decomposition of V3 


and this is illustrated in [link]. This gives V; also to be decomposed as 
Equation: 


VY, = VoeWo 
which is shown in [link]. By continuing to decompose the space spanned by 


the scaling function until the space is one constant, the complete 
decomposition of V3 is obtained. This is symbolically shown in [link]. 
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Vi = Vo ®@Wo 


Haar Scaling Functions and Wavelets Decomposition 
of Vi 


Finally we look at an approximation to a smooth function constructed from 
the basis elements in V3 = Vp 8B Wo © W © Ws. Because the Haar 
functions form an orthogonal basis in each subspace, they can produce an 
optimal least squared error approximation to the smooth function. One can 
easily imagine the effects of adding a higher resolution “layer" of functions 
to Ws giving an approximation residing in V4. Notice that these functions 
satisfy all of the conditions that we have considered for scaling functions 
and wavelets. The basic wavelet is indeed an oscillating function which, in 
fact, has an average of zero and which will produce finer and finer detail as 
it is scaled and translated. 


The multiresolution character of the scaling function and wavelet system is 
easily seen from [link] where a signal residing in V3 can be expressed in 
terms of a sum of eight shifted scaling functions at scale 7 = 3 or a sum of 
four shifted scaling functions and four shifted wavelets at a scale of 7 = 2. 
In the second case, the sum of four scaling functions gives a low resolution 
approximation to the signal with the four wavelets giving the higher 
resolution “detail”. The four shifted scaling functions could be further 
decomposed into coarser scaling functions and wavelets as illustrated in 
[link] and still further decomposed as shown in [link]. 


[link] shows the Haar approximations of a test function in various 
resolutions. The signal is an example of a mixture of a pure sine wave 
which would have a perfectly localized Fourier domain representation and a 


two discontinuities which are completely localized in time domain. The 
component at the coarsest scale is simply the average of the signal. As we 
include more and more wavelet scales, the approximation becomes close to 
the original signal. 


This chapter has skipped over some details in an attempt to communicate 
the general idea of the method. The conditions that can or must be satisfied 
and the resulting properties, together with examples, are discussed in the 
following chapters and/or in the references. 
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Filter Banks and the Discrete Wavelet Transform 


In many applications, one never has to deal directly with the scaling functions or wavelets. Only the coefficients 
h(n) and h, (7m) in the defining equations [link] and [link] and c(&) and d; (k) in the expansions [link], [link], 
and [link] need be considered, and they can be viewed as digital filters and digital signals respectively [link], 
[link]. While it is possible to develop most of the results of wavelet theory using only filter banks, we feel that 
both the signal expansion point of view and the filter bank point of view are necessary for a real understanding 
of this new tool. 


Analysis — From Fine Scale to Coarse Scale 


In order to work directly with the wavelet transform coefficients, we will derive the relationship between the 
expansion coefficients at a lower scale level in terms of those at a higher scale. Starting with the basic recursion 
equation from [link] 

Equation: 


g(t) = > h(n) V2 (2t - 0) 


and assuming a unique solution exists, we scale and translate the time variable to give 
Equation: 


y (2%t—k = eee (2/t — k) —n) = eee) 


which, after changing variables m = 2k + n, becomes 
Equation: 


y (2/t —k =e 2k) V2y (2744 — m). 


If we denote V; as 
Equation: 


V;= Spon {21? yp (2’t a k)} 


then 
Equation: 


fHeEVjun => fOY= Ss Cy1 (hk) 2979 /? wp (23*%¢ — k) 


is expressible at a scale of 7 + 1 with scaling functions only and no wavelets. At one scale lower resolution, 
wavelets are necessary for the “detail" not available at a scale of 7. We have 
Equation: 


f(t) = Soe; (k) 2/2 y (2%t = )+ do 4s (k) 24/2 wp (2% — k) 
k 


where the 2//? terms maintain the unity norm of the basis functions at various scales. If yj, (t) and #;, (t) are 
orthonormal or a tight frame, the 7 level scaling coefficients are found by taking the inner product 
Equation: 


cj (k) = (f(t), pie (t)) = [ro 23/2 (2°t —k) dt 


which, by using [link] and interchanging the sum and integral, can be written as 
Equation: 


cj (k) = S_ h(m — 2k) i f(t) 294? «w (2714 — m) dt 


but the integral is the inner product with the scaling function at a scale of 7 + 1 giving 
Equation: 


cj (k) = S> h(m = 2k) cys (m). 


The corresponding relationship for the wavelet coefficients is 
Equation: 


d;(k) = Sha (m — 2k) eju (m). 


Filtering and Down-Sampling or Decimating 


In the discipline of digital signal processing, the “filtering" of a sequence of numbers (the input signal) is 
achieved by convolving the sequence with another set of numbers called the filter coefficients, taps, weights, or 
impulse response. This makes intuitive sense if you think of a moving average with the coefficients being the 
weights. For an input sequence z(7) and filter coefficients h(n), the output sequence y(n) is given by 
Equation: 


N-1 
y(n) = Sh(k)e(n—k) 
k=0 


There is a large literature on digital filters and how to design them [link], [link]. If the number of filter 
coefficients NV is finite, the filter is called a Finite Impulse Response (FIR) filter. If the number is infinite, it is 
called an Infinite Impulse (IIR) filter. The design problem is the choice of the h(n) to obtain some desired 
effect, often to remove noise or separate signals [link], [link]. 


x(n) 2] az(2n) 


The Down Sampler of 
Decimator 


In multirate digital filters, there is an assumed relation between the integer index n in the signal x(n) and time. 
Often the sequence of numbers are simply evenly spaced samples of a function of time. Two basic operations in 
multirate filters are the down-sampler and the up-sampler. The down-sampler (sometimes simply called a 
sampler or a decimator) takes a signal x(7) as an input and produces an output of y(n) = x(27). This is 
symbolically shown in [link]. In some cases, the down-sampling is by a factor other than two and in some cases, 
the output is the odd index terms y(n) = x(2n + 1), but this will be explicitly stated if it is important. 


In down-sampling, there is clearly the possibility of losing information since half of the data is discarded. The 
effect in the frequency domain (Fourier transform) is called aliasing which states that the result of this loss of 
information is a mixing up of frequency components [link], [link]. Only if the original signal is band-limited 
(half of the Fourier coefficients are zero) is there no loss of information caused by down-sampling. 


We talk about digital filtering and down-sampling because that is exactly what [link] and [link] do. These 
equations show that the scaling and wavelet coefficients at different levels of scale can be obtained by 
convolving the expansion coefficients at scale j by the time-reversed recursion coefficients h(—n) and h, (—n) 
then down-sampling or decimating (taking every other term, the even terms) to give the expansion coefficients 
at the next level of 7 — 1. In other words, the scale-j coefficients are “filtered” by two FIR digital filters with 
coefficients h(—n) and h, (—n) after which down-sampling gives the next coarser scaling and wavelet 
coefficients. These structures implement Mallat's algorithm [link], [link] and have been developed in the 
engineering literature on filter banks, quadrature mirror filters (QMF), conjugate filters, and perfect 
reconstruction filter banks [link], [link], [link], [link], [link], [link], [link] and are expanded somewhat in 
Chapter: Filter Banks and Transmultiplexers of this book. Mallat, Daubechies, and others showed the relation of 
wavelet coefficient calculation and filter banks. The implementation of [link] and [link] is illustrated in [link] 
where the down-pointing arrows denote a decimation or down-sampling by two and the other boxes denote FIR 
filtering or a convolution by h(—n) or hj (—n). To ease notation, we use both h(n) and ho (n) to denote the 
scaling function coefficients for the dilation equation [link]. 


Two-Band Analysis Bank 


Cj+1 


Cj-1 


Two-Stage Two-Band Analysis Tree 


As we will see in Chapter: The Scaling Function and Scaling Coefficients, Wavelet and Wavelet Coefficients 
the FIR filter implemented by h(—n) is a lowpass filter, and the one implemented by h; (—n) is a highpass 
filter. Note the average number of data points out of this system is the same as the number in. The number is 
doubled by having two filters; then it is halved by the decimation back to the original number. This means there 
is the possibility that no information has been lost and it will be possible to completely recover the original 
signal. As we shall see, that is indeed the case. The aliasing occurring in the upper bank can be “undone” or 
cancelled by using the signal from the lower bank. This is the idea behind perfect reconstruction in filter bank 
theory [link], [link]. 


This splitting, filtering, and decimation can be repeated on the scaling coefficients to give the two-scale structure 
in [link]. Repeating this on the scaling coefficients is called iterating the filter bank. Iterating the filter bank 
again gives us the three-scale structure in [link]. 


The frequency response of a digital filter is the discrete-time Fourier transform of its impulse response 
(coefficients) h(n). That is given by 
Equation: 


H(w) = S h(n) e". 


n=—CoO 


The magnitude of this complex-valued function gives the ratio of the output to the input of the filter for a 
sampled sinusoid at a frequency of w in radians per seconds. The angle of H(w) is the phase shift between the 
output and input. 


The first stage of two banks divides the spectrum of c;;1 (&) into a lowpass and highpass band, resulting in the 
scaling coefficients and wavelet coefficients at lower scale c; (kK) and d; (k). The second stage then divides that 
lowpass band into another lower lowpass band and a bandpass band. The first stage divides the spectrum into 
two equal parts. The second stage divides the lower half into quarters and so on. This results in a logarithmic set 
of bandwidths as illustrated in [link]. These are called “constant-Q" filters in filter bank language because the 
ratio of the band width to the center frequency of the band is constant. It is also interesting to note that a musical 
scale defines octaves in a similar way and that the ear responds to frequencies in a similar logarithmic fashion. 


For any practical signal that is bandlimited, there will be an upper scale 7 = J, above which the wavelet 
coefficients, d; (k), are negligibly small [link]. By starting with a high resolution description of a signal in terms 
of the scaling coefficients c7, the analysis tree calculates the DWT 


down to as low a resolution, 7 = jo, as desired by having J — jo stages. So, for f (t) € Vz, using [link] we 
have 


Equation: 
f(t) = Soes(®) gre) 
k 
= Sieg (k) pr-ik (t) + S0 ds (&) Psa (6) 
k k 
eel 
FQ) = Soer2(b) proe(t)+ > SY dj (b) dja (8) 
k k j=J-2 


J-1 
FE) = Seg (k) Pink (tC) + S> So dj (b) Yin (8) 
k 


k j=Jo 


which is a finite scale version of [link]. We will discuss the choice of jg and J further in Chapter: Calculation of 
the Discrete Wavelet Transform. 


Three-Stage Two-Band Analysis Tree 
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Frequency Bands for the Analysis Tree 


Synthesis — From Coarse Scale to Fine Scale 


As one would expect, a reconstruction of the original fine scale coefficients of the signal can be made from a 
combination of the scaling function and wavelet coefficients at a coarse resolution. This is derived by 
considering a signal in the 7 + 1 scaling function space f (t) € Vj+1. This function can be written in terms of 


the scaling function as 
Equation: 


f(t) = Do egas () 2049? op (241 — b) 
k 


or in terms of the next scale (which also requires wavelets) as 
Equation: 


f(t) = Soc; (h) 29? yp (2% — b) + Sd; (b) 24? (2% — k). 
k k 


Substituting [link] and [link] into [link] gives 
Equation: 


f(t) = So e5(k) So a(n) 24? o (274 — 2k — 1) + $0 dy (kb) Sha (n) 2049? y (2714 — 2k — 1). 
k n k n 


Because all of these functions are orthonormal, multiplying [link] and [link] by y (29+1¢ = k’) and integrating 
evaluates the coefficient as 
Equation: 


cj (k) = S> cj (m)h(k— 2m) + S> dj(m) hi (k — 2m). 


m 


Filtering and Up-Sampling or Stretching 


For synthesis in the filter bank we have a sequence of first up-sampling or stretching, then filtering. This means 
that the input to the filter has zeros inserted between each of the original terms. In other words, 
Equation: 


y(2n) = a(n) and y(2n+1)=0 


where the input signal is stretched to twice its original length and zeros are inserted. Clearly this up-sampling or 
stretching could be done with factors other than two, and the two equation above could have the «(n) and 0 
reversed. It is also clear that up-sampling does not lose any information. If you first up-sample then down- 
sample, you are back where you started. However, if you first down-sample then up-sample, you are not 
generally back where you started. 


Our reason for discussing filtering and up-sampling here is that is exactly what the synthesis operation [link] 
does. This equation is evaluated by up-sampling the 7 scale coefficient sequence c; (k), which means double its 
length by inserting zeros between each term, then convolving it with the scaling coefficients h(n). The same is 
done to the 7 level wavelet coefficient sequence and the results are added to give the 7 + 1 level scaling function 
coefficients. This structure is illustrated in [link] where go (n) = h(n) and gi (n) = hy (n). This combining 
process can be continued to any level by combining the appropriate scale wavelet coefficients. The resulting 
two-scale tree is shown in [link]. 
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Two-Band Synthesis Bank 
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Two-Stage Two-Band Synthesis Tree 


Input Coefficients 


One might wonder how the input set of scaling coefficients c;,1 are obtained from the signal to use in the 
systems of [link] and [link]. For high enough scale, the scaling functions act as “delta functions" with the inner 
product to calculate the high scale coefficients as simply a sampling of f(¢)[link], [link]. If the samples of f(t) 
are above the Nyquist rate, they are good approximations to the scaling coefficients at that scale, meaning no 
wavelet coefficients are necessary at that scale. This approximation is particularly good if moments of the 
scaling function are zero or small. These ideas are further explained in Section: Approximation of Scaling 
Coefficients by Samples of the Signal and Chapter: Calculation of the Discrete Wavelet Transform. 


An alternative approach is to “prefilter" the signal samples to make them a better approximation to the 
expansion coefficients. This is discussed in [link]. 


This set of analysis and synthesis operations is known as Mallat's algorithm [link], [link]. The analysis filter 
bank efficiently calculates the DWT using banks of digital filters and down-samplers, and the synthesis filter 
bank calculates the inverse DWT to reconstruct the signal from the transform. Although presented here as a 
method of calculating the DWT, the filter bank description also gives insight into the transform itself and 
suggests modifications and generalizations that would be difficult to see directly from the wavelet expansion 
point of view. Filter banks will be used more extensively in the remainder of this book. A more general 


Although a pure wavelet expansion is possible as indicated in [link] and [link], properties of the wavelet are best 
developed and understood through the scaling function. This is certainly true if the scaling function has compact 
support because then the wavelet is composed of a finite sum of scaling functions given in [link]. 


In a practical situation where the wavelet expansion or transform is being used as a computational tool in signal 
processing or numerical analysis, the expansion can be made finite. If the basis functions have finite support, 
only a finite number of additions over & are necessary. If the scaling function is included as indicated in [link] or 
[link], the lower limit on the summation over 7 is finite. If the signal is essentially bandlimited, there is a scale 
above which there is little or no energy and the upper limit can be made finite. That is described in Chapter: 
Calculation of the Discrete Wavelet Transform. 


Lattices and Lifting 


An alternative to using the basic two-band tree-structured filter bank is a lattice-structured filter bank. Because 
of the relationship between the scaling filter h(n) and the wavelet filter h; (n) given in [link], some of the 
calculation can be done together with a significant savings in arithmetic. This is developed in Chapter: 
Calculation of the Discrete Wavelet Transform [link]. 


Still another approach to the calculation of discrete wavelet transforms and to the calculations of the scaling 
functions and wavelets themselves is called “lifting.” [link],[link] Although it is related to several other schemes 


[link], [link], [link], [link], this idea was first explained by Wim Sweldens as a time-domain construction based 
on interpolation [link]. Lifting does not use Fourier methods and can be applied to more general problems (e.g., 
nonuniform sampling) than the approach in this chapter. It was first applied to the biorthogonal system [link] 
and then extended to orthogonal systems [link]. The application of lifting to biorthogonal is introduced in 
Section: Biorthogonal Wavelet Systems later in this book. Implementations based on lifting also achieve the 
same improvement in arithmetic efficiency as the lattice structure do. 


Different Points of View 


Multiresolution versus Time-Frequency Analysis 


The development of wavelet decomposition and the DWT has thus far been in terms of multiresolution where 
the higher scale wavelet components are considered the “detail" on a lower scale signal or image. This is indeed 
a powerful point of view and an accurate model for many signals and images, but there are other cases where the 
components of a composite signal at different scales and/or time are independent or, at least, not details of each 
other. If you think of a musical score as a wavelet decomposition, the higher frequency notes are not details on a 
lower frequency note; they are independent notes. This second point of view is more one of the time-frequency 
or time-scale analysis methods [link], [link], [link], [link], [link] and may be better developed with wavelet 
packets (see Section: Wavelet Packets), M-band wavelets (see Section: Multiplicity-M (M-band) Scaling 
Functions and Wavelets), or a redundant representation (see Section: Overcomplete Representations, Frames, 
Redundant Transforms, and Adaptive Bases), but would still be implemented by some sort of filter bank. 


Periodic versus Nonperiodic Discrete Wavelet Transforms 


Unlike the Fourier series, the DWT can be formulated as a periodic or a nonperiodic transform. Up until now, 
we have considered a nonperiodic series expansion [link] over —oo < t < oo with the calculations made by the 
filter banks being an on-going string of coefficients at each of the scales. If the input to the filter bank has a 
certain rate, the output at the next lower scale will be two sequences, one of scaling function coefficients 
Cj—1,k-1 and one of wavelet coefficients dj_1,,-1, each, after down-sampling, being at half the rate of the input. 
At the next lower scale, the same process is done on the scaling coefficients to give a total output of three 
strings, one at half rate and two at quarter rate. In other words, the calculation of the wavelet transform 
coefficients is a multirate filter bank producing sequences of coefficients at different rates but with the average 
number at any stage being the same. This approach can be applied to any signal, finite or infinite in length, 
periodic or nonperiodic. Note that while the average output rate is the same as the average input rate, the number 
of output coefficients is greater than the number of input coefficients because the length of the output of 
convolution is greater than the length of the input. 


An alternative formulation that can be applied to finite duration signals or periodic signals (much as the Fourier 
series) is to make all of the filter bank filters cyclic or periodic convolution which is defined by 
Equation: 


N-1 


y(n) = STh(Oa(n- 98), 
£=0 


for n, £=0,1,---,N —1 and all indices and arguments are evaluated modulo N. Fora length N input at scale 
j = J, we have after one stage two length N’/2 sequences, after two stages, one length N’/2 and two length 
N’/4 sequences, and so on. If N = 2/, this can be repeated J times with the last stage being length one; one 
scaling function coefficient and one wavelet coefficient. An example of how the periodic DWT of a length 8 can 
be seen [link]. 


c;(k) d;(k) dj+1(k) dj41(k+1) dj+2(k) dj.2(k+1) dj+2(k+2) dj42(k+3) 


The length-8 DWT vector 


The details of this periodic approach are developed in Chapter: Calculation of the Discrete Wavelet Transform 
showing the aliasing that takes place in this system because of the cyclic convolution [link]. This formulation is 
particularly clean because there are the same number of terms in the transform as in the signal. It can be 
represented by a square matrix with a simple inverse that has interesting structure. It can be efficiently 
calculated by an FFT although that is not needed for most applications. 


For most of the theoretical developments or for conceptual purposes, there is little difference in these two 
formulations. However, for actual calculations and in applications, you should make sure you know which one 
you want or which one your software package calculates. As for the Fourier case, you can use the periodic form 
to calculate the nonperiodic transform by padding the signal with zeros but that wastes some of the efficiency 
that the periodic formulation was set up to provide. 


The Discrete Wavelet Transform versus the Discrete-Time Wavelet Transform 


Two more points of view concern looking at the signal processing methods in this book as based on an 
expansion of a signal or on multirate digital filtering. One can look at Mallat's algorithm either as a way of 
calculating expansion coefficients at various scales or as a filter bank for processing discrete-time signals. The 
first is analogous to use of the Fourier series (FS) where a continuous function is transformed into a discrete 
sequence of coefficients. The second is analogous to the discrete Fourier transform (DFT) where a discrete 
function is transformed into a discrete function. Indeed, the DFT (through the FFT) is often used to calculate the 
Fourier series coefficients, but care must be taken to avoid or minimize aliasing. The difference in these views 
comes partly from the background of the various researchers (i.e., whether they are “wavelet people" or “filter 
bank people"). However, there are subtle differences between using the series expansion of the signal (using the 
discrete wavelet transform (DWT)) and using a multirate digital filter bank on samples of the signal (using the 
discrete-time wavelet transform (DTWT)). Generally, using both views gives more insight into a problem than 
either achieves alone. The series expansion is the main approach of this book but filter banks and the DTWT are 
also developed in Section: Discrete Multiresolution Analysis and the Discrete-Time wavelet Transform and 
Chapter: Filter Banks and Transmultiplexers . 


Numerical Complexity of the Discrete Wavelet Transform 


Analysis of the number of mathematical operations (floating-point multiplications and additions) shows that 
calculating the DTWT of a length-V sequence of numbers using Mallat's algorithm with filter banks requires 
O(N) operations. In other words, the number of operations is linear with the length of the signal. What is more, 
the constant of linearity is relatively small. This is in contrast to the FFT algorithm for calculating the DFT 
where the complexity is O(N log (N)) or calculating a DFT directly requires O(N =) operations. It is often 
said that the FFT algorithm is based on a “divide and conquer" scheme, but that is misleading. The process is 
better described as a “organize and share" scheme. The efficiency (in fact, optimal efficiency) is based on 
organizing the calculations so that redundant operations can be shared. The cascaded filtering (convolution) and 
down-sampling of Mallat's algorithm do the same thing. 


One should not make too much of this difference between the complexity of the FFT and DTWT. It comes from 
the DTWT having a logarithmic division of frequency bands and the FFT having a uniform division. This 
logarithmic scale is appropriate for many signals but if a uniform division is used for the wavelet system such as 


is done for wavelet packets (see Section: Wavelet Packets) or the redundant DWT (see Section: Overcomplete 


becomes O(N log (N)). 


If you are interested in more details of the discrete wavelet transform and the discrete-time wavelet transform, 
relations between them, methods of calculating them, further properties of them, or examples, see Section: 
Discrete Multiresolution Analysis, the Discrete-Time Wavelet and Chapter: Calculation of the Discrete Wavelet 
Transform. 


Bases, Orthogonal Bases, Biorthogonal Bases, Frames, Tight Frames, and unconditional 
Bases 
Development of ideas of vector expansion 


Most people with technical backgrounds are familiar with the ideas of expansion vectors or 
basis vectors and of orthogonality; however, the related concepts of biorthogonality or of 
frames and tight frames are less familiar but also important. In the study of wavelet systems, 
we find that frames and tight frames are needed and should be understood, at least at a 
superficial level. One can find details in [link], [link], [link], [link], [link]. Another perhaps 
unfamiliar concept is that of an unconditional basis used by Donoho, Daubechies, and others 
[link], [link], [link] to explain why wavelets are good for signal compression, detection, and 
denoising [link], [link]. In this chapter, we will very briefly define and discuss these ideas. At 
this point, you may want to skip these sections and perhaps refer to them later when they are 
specifically needed. 


Bases, Orthogonal Bases, and Biorthogonal Bases 


A set of vectors or functions f;, (t)spans a vector space F (or F' is the Span of the set) if any 
element of that space can be expressed as a linear combination of members of that set, 
meaning: Given the finite or infinite set of functions f; (t), we define Span, {f;,} = Fas the 
vector space with all elements of the space of the form 

Equation: 


g(t) = ax f(t) 


k 


with k € Z andt,a € R. An inner product is usually defined for this space and is denoted 


(f(t), g(t)). A norm is defined and is denoted by || f ||= «/(f, f). 


We say that the set f;, (t) is a basis set or a basis for a given space F if the set of {a,} in 
[link] are unique for any particular g(t) € F’. The set is called an orthogonal basis if 

(fr (t), fe (t)) = 0 for all k F &. If we are in three dimensional Euclidean space, orthogonal 
basis vectors are coordinate vectors that are at right (90°) angles to each other. We say the set 
is an orthonormal basis if (f;, (t), fc (t)) = 6 (k — £) ie. if, in addition to being orthogonal, 
the basis vectors are normalized to unity norm: || f; (¢) ||= 1 for all k. 


From these definitions it is clear that if we have an orthonormal basis, we can express any 
element in the vector space, g(t) € F’, written as [link] by 
Equation: 


g(t) = >) (a(t), fe (t)) fe (2) 


k 


since by taking the inner product of f; (¢) with both sides of [link], we get 


Equation: 


ax = (g(t), fr (t)) 


where this inner product of the signal g(t) with the basis vector f;, (t) “picks out" the 
corresponding coefficient a. This expansion formulation or representation is extremely 
valuable. It expresses [link] as an identity operator in the sense that the inner product operates 
on g(t) to produce a set of coefficients that, when used to linearly combine the basis vectors, 
gives back the original signal g(t). It is the foundation of Parseval's theorem which says the 
norm or energy can be partitioned in terms of the expansion coefficients a. It is why the 
interpretation, storage, transmission, approximation, compression, and manipulation of the 
coefficients can be very useful. Indeed, [link] is the form of all Fourier type methods. 


Although the advantages of an orthonormal basis are clear, there are cases where the basis 
system dictated by the problem is not and cannot (or should not) be made orthogonal. For 
these cases, one can still have the expression of [link] and one similar to [link] by using a dual 
basis set fy, (£) whose elements are not orthogonal to each other, but to the corresponding 


element of the expansion set 
Equation: 


(felt), fe(t)) = 8(€-f) 


Because this type of “orthogonality” requires two sets of vectors, the expansion set and the 
dual set, the system is called biorthogonal. Using [link] with the expansion in [link] gives 
Equation: 


9) => (90), fe) fe) 


k 


Although a biorthogonal system is more complicated in that it requires, not only the original 
expansion set, but the finding, calculating, and storage of a dual set of vectors, it is very 
general and allows a larger class of expansions. There may, however, be greater numerical 
problems with a biorthogonal system if some of the basis vectors are strongly correlated. 


The calculation of the expansion coefficients using an inner product in [link] is called the 
analysis part of the complete process, and the calculation of the signal from the coefficients 
and expansion vectors in [link] is called the synthesis part. 


In finite dimensions, analysis and synthesis operations are simply matrix—vector 
multiplications. If the expansion vectors in [link] are a basis, the synthesis matrix has these 
basis vectors as columns and the matrix is square and non singular. If the matrix is orthogonal, 
its rows and columns are orthogonal, its inverse is its transpose, and the identity operator is 
simply the matrix multiplied by its transpose. If it is not orthogonal, then the identity is the 


matrix multiplied by its inverse and the dual basis consists of the rows of the inverse. If the 
matrix is singular, then its columns are not independent and, therefore, do not form a basis. 


Matrix Examples 


Using a four dimensional space with matrices to illustrate the ideas of this chapter, the 
synthesis formula g(t) = 5°, ax fx (t) becomes 


Equation: 
9(0) fo (0) fi (0) fe (0) f3 (0) 
g(1) =— fo (1) — fi (1) she fe (1) ree f3 (1) 
g(2) ° fo (2) ; fi (2) : fo (2) : f3 (2) 
9(3) fo (3) fi (3) fe (3) f3 (3) 


which can be compactly written in matrix form as 


Equation: 
g(0) fo(0) f1(0) fo(0) f3(0) a 
g(1) = fo) fiG) feQ) fs) a 
9(2) fo(2) fi (2) fo(2) f3(2) a 
9(3) fo(3) £1 (3) fe(3) f3 (3) a3 


The synthesis or expansion [link] or [link] becomes 
Equation: 


g=Fa, 
with the left-hand column vector g being the signal vector, the matrix F formed with the basis 


vectors f, as columns, and the right-hand vector a containing the four expansion coefficients, 
Qk. 


The equation for calculating the k’” expansion coefficient in [link] is 
Equation: 


which can be written in vector form as 
Equation: 


ay fo (0) fod) fol2) fo(3) (0) 
a — fi(0.) AG) AQ) AGB) 9) 
4 2 (0) fa(l) fa(2) fo(3) 9) 
* FO ha) he ke © 


where each ay, is an inner product of the k“” row of FT with g and analysis or coefficient 
[link] or [link] becomes 


Equation: 
a= FT g 

which together are [link] or 
Equation: 

g=FF*g 
Therefore, 
Equation: 

FT — F? 


is how the dual basis in [link] is found. 


If the columns of F are orthogonal and normalized, then 
Equation: 


F FT =I. 
This means the basis and dual basis are the same, and [link] and [link] become 
Equation: 
g-FF*g 
and 
Equation: 


FT — FT 


which are both simpler and more numerically stable than [link]. 


The discrete Fourier transform (DFT) is an interesting example of a finite dimensional Fourier 
transform with orthogonal basis vectors where matrix and vector techniques can be 
informative as to the DFT's characteristics and properties. That can be found developed in 
several signal processing books. 


Fourier Series Example 


The Fourier Series is an excellent example of an infinite dimensional composition (synthesis) 
and decomposition (analysis). The expansion formula for an even function g(t) over 

<2 <27 is 

Equation: 


g(t) = S> ax cos (kt) 
k 


where the basis vectors (functions) are 
Equation: 


fx (t) =cos (kt) 


and the expansion coefficients are obtained as 
Equation: 


HO, iO = [ " g(t) cos (#4) de. 


Tv 


The basis vector set is easily seen to be orthonormal by verifying 
Equation: 


(fe (t), fx (t)) = 6(k — 2). 


These basis functions span an infinite dimensional vector space and the convergence of [link] 
must be examined. Indeed, it is the robustness of that convergence that is discussed in this 
section under the topic of unconditional bases. 


Sinc Expansion Example 


Another example of an infinite dimensional orthogonal basis is Shannon's sampling expansion 
[link]. If f(£) is band limited, then 
Equation: 


for a sampling interval T < +; if the spectrum of f(t) is zero for |w| > W. In this case the 
basis functions are the sinc functions with coefficients which are simply samples of the 
original function. This means the inner product of a sinc basis function with a bandlimited 
function will give a sample of that function. It is easy to see that the sinc basis functions are 
orthogonal by taking the inner product of two sinc functions which will sample one of them at 
the points of value one or zero. 


Frames and Tight Frames 


While the conditions for a set of functions being an orthonormal basis are sufficient for the 
representation in [link] and the requirement of the set being a basis is sufficient for [link], they 
are not necessary. To be a basis requires uniqueness of the coefficients. In other words it 
requires that the set be independent, meaning no element can be written as a linear 
combination of the others. 


If the set of functions or vectors is dependent and yet does allow the expansion described in 
[link], then the set is called a frame [link]. Thus, a frame is a spanning set. The term frame 
comes from a definition that requires finite limits on an inequality bound [link], [link] of inner 
products. 


If we want the coefficients in an expansion of a signal to represent the signal well, these 
coefficients should have certain properties. They are stated best in terms of energy and energy 
bounds. For an orthogonal basis, this takes the form of Parseval's theorem. To be a frame in a 
signal space, an expansion set ¢, (t) must satisfy 

Equation: 


2 
Allg? <> (veg) < Big)’ 
k 


for some 0 < A and B < oo and for all signals g(t) in the space. Dividing [ink] by || g ||? 
shows that A and B are bounds on the normalized energy of the inner products. They “frame" 
the normalized coefficient energy. If 

Equation: 


A=8B 


then the expansion set is called a tight frame. This case gives 
Equation: 


Allg? = So l(a 9)! 
k 


which is a generalized Parseval's theorem for tight frames. If A = B = 1, the tight frame 
becomes an orthogonal basis. From this, it can be shown that for a tight frame [link] 
Equation: 


g(t)=A*S- (ye (t),9(t)) x(t) 
k 


which is the same as the expansion using an orthonormal basis except for the A~! term which 
is a measure of the redundancy in the expansion set. 


If an expansion set is a non tight frame, there is no strict Parseval's theorem and the energy in 
the transform domain cannot be exactly partitioned. However, the closer A and B are, the 
better an approximate partitioning can be done. If A = B, we have a tight frame and the 
partitioning can be done exactly with [link]. Daubechies [link] shows that the tighter the 
frame bounds in [link] are, the better the analysis and synthesis system is conditioned. In 
other words, if A is near or zero and/or B is very large compared to A, there will be 
numerical problems in the analysis—synthesis calculations. 


Frames are an over-complete version of a basis set, and tight frames are an over-complete 
version of an orthogonal basis set. If one is using a frame that is neither a basis nor a tight 
frame, a dual frame set can be specified so that analysis and synthesis can be done as for a 
non-orthogonal basis. If a tight frame is being used, the mathematics is very similar to using 
an orthogonal basis. The Fourier type system in [link] is essentially the same as [link], and 
[link] is essentially a Parseval's theorem. 


The use of frames and tight frames rather than bases and orthogonal bases means a certain 
amount of redundancy exists. In some cases, redundancy is desirable in giving a robustness to 
the representation so that errors or faults are less destructive. In other cases, redundancy is an 
inefficiency and, therefore, undesirable. The concept of a frame originates with Duffin and 
Schaeffer [link] and is discussed in [link], [link], [link]. In finite dimensions, vectors can 
always be removed from a frame to get a basis, but in infinite dimensions, that is not always 
possible. 


An example of a frame in finite dimensions is a matrix with more columns than rows but with 
independent rows. An example of a tight frame is a similar matrix with orthogonal rows. An 
example of a tight frame in infinite dimensions would be an over-sampled Shannon 
expansion. It is informative to examine this example. 


Matrix Examples 


An example of a frame of four expansion vectors f, in a three-dimensional space would be 
Equation: 


9(0) —fo(0) fi(0) fa) fa) 
(1) = fol) AQ) fA) fa) | 
g(2) fol) AQ) Al) BQ) | 


which corresponds to the basis shown in the square matrix in [link]. The corresponding 
analysis equation is 


Equation: 
re fo(0) fo(1) fo (2) ' 
a _ AO AQ Ae 
a2 fo(0) fo(1) fe (2) g(2) 
va fs(0) fs(1) fs (2) 


which corresponds to [link]. One can calculate a set of dual frame vectors by temporarily 
appending an arbitrary independent row to [link], making the matrix square, then using the 
first three columns of the inverse as the dual frame vectors. This clearly illustrates the dual 
frame is not unique. Daubechies [link] shows how to calculate an “economical" unique dual 
frame. 


The tight frame system occurs in wavelet infinite expansions as well as other finite and 
infinite dimensional systems. A numerical example of a frame which is a normalized tight 
frame with four vectors in three dimensions is 


Equation: 
9(0) iL 2 oe 
1 1 ay 
gl) =Z— 1-1 1-1 
g(2) v844 41 ” 
a3 


which includes the redundancy factor from [link]. Note the rows are orthogonal and the 
columns are normalized, which gives 
Equation: 


FFT=~— 1 -1 fal ey aa et te ee 
34041 4. V8 > 001 
a es | 
or 
Equation: 
1 
g=7 FF’g 


which is the matrix form of [link]. The factor of A = 4/3 is the measure of redundancy in 
this tight frame using four expansion vectors in a three-dimensional space. 


The identity for the expansion coefficients is 
Equation: 


which for the numerical example gives 


Equation: 
1 141 1 1 — 
i eh ol /3 1/3 1/3 
T 1 Ll. dt: fd 1/3 1 —1/3 1/3 
F* F = —~ — 1-1 1-1 
go Ak «ol di 340641041401 1/3 -1/3 1 1/3 
-1 =1 1 —1/3 1/3 1/3 1 


Although this is not a general identity operator, it is an identity operator over the three- 
dimensional subspace that a is in and it illustrates the unity norm of the rows of F™ and 
columns of F. 


If the redundancy measure A in [link] and [link] is one, the matrices must be square and the 
system has an orthonormal basis. 


Frames are over-complete versions of non-orthogonal bases and tight frames are over- 
complete versions of orthonormal bases. Tight frames are important in wavelet analysis 
because the restrictions on the scaling function coefficients discussed in Chapter: The Scaling 
Function and Scaling Coefficients, Wavelet and Wavelet Coefficients guarantee not that the 
wavelets will be a basis, but a tight frame. In practice, however, they are usually a basis. 


Sinc Expansion as a Tight Frame Example 


An example of an infinite-dimensional tight frame is the generalized Shannon's sampling 
expansion for the over-sampled case [link]. If a function is over-sampled but the sinc 
functions remains consistent with the upper spectral limit W, the sampling theorem becomes 
Equation: 


TW sin ((¢ — Tn)W) 
i) = -— Tn) ————_——— 
9(t) CG raw 
or using & as the amount of over-sampling 
Equation: 
RW = —, for R>1 

we have 
Equation: 


sin (= (t —Tn 
6-2 n= 


ar (t — Tn) 


where the sinc functions are no longer orthogonal now. In fact, they are no longer a basis as 
they are not independent. They are, however, a tight frame and, therefore, act as though they 
were an orthogonal basis but now there is a “redundancy" factor R as a multiplier in the 
formula. 


Notice that as R is increased from unity, [link] starts as [link] where each sample occurs 
where the sinc function is one or zero but becomes an expansion with the shifts still being 

t = Tn, however, the sinc functions become wider so that the samples are no longer at the 
zeros. If the signal is over-sampled, either the expression [link] or [link] could be used. They 
both are over-sampled but [link] allows the spectrum of the signal to increase up to the limit 
without distortion while [link] does not. The generalized sampling theorem [link] has a built- 
in filtering action which may be an advantage or it may not. 


The application of frames and tight frames to what is called a redundant discrete wavelet 
transform (RDWT) is discussed later in Section: Overcomplete Representations, Frames, 
Redundant Transforms, and Adaptive Bases and their use in Section: Nonlinear Filtering or 
Denoising with the DWT. They are also needed for certain adaptive descriptions discussed at 
Adaptive Bases where an independent subset of the expansion vectors in the frame are chosen 
according to some criterion to give an optimal basis. 


Conditional and Unconditional Bases 


A powerful point of view used by Donoho [link] gives an explanation of which basis systems 
are best for a particular class of signals and why the wavelet system is good for a wide variety 
of signal classes. 


Donoho defines an unconditional basis as follows. If we have a function class F with a norm 
defined and denoted || - ||, and a basis set f;, such that every function g € F has a unique 
representation g = )>, ax fx with equality defined as a limit using the norm, we consider the 
infinite expansion 

Equation: 


g(t) = S> max fr (t)- 
k 


If for all g € F, the infinite sum converges for all |7m;|< 1, the basis is called an 
unconditional basis. This is very similar to unconditional or absolute convergence of a 
numerical series [link], [link], [link]. If the convergence depends on m; = 1 for some g(t), 
the basis is called a conditional basis. 


An unconditional basis means all subsequences converge and all sequences of subsequences 
converge. It means convergence does not depend on the order of the terms in the summation 
or on the sign of the coefficients. This implies a very robust basis where the coefficients drop 
off rapidly for all members of the function class. That is indeed the case for wavelets which 
are unconditional bases for a very wide set of function classes [link], [link], [link]. 


Unconditional bases have a special property that makes them near-optimal for signal 
processing in several situations. This property has to do with the geometry of the space of 
expansion coefficients of a class of functions in an unconditional basis. This is described in 
[link]. 


The fundamental idea of bases or frames is representing a continuous function by a sequence 
of expansion coefficients. We have seen that the Parseval's theorem relates the L? norm of the 
function to the 2? norm of coefficients for orthogonal bases and tight frames [link]. Different 
function spaces are characterized by different norms on the continuous function. If we have an 
unconditional basis for the function space, the norm of the function in the space not only can 
be related to some norm of the coefficients in the basis expansion, but the absolute values of 
the coefficients have the sufficient information to establish the relation. So there is no 
condition on the sign or phase information of the expansion coefficients if we only care about 
the norm of the function, thus unconditional. 


For this tutorial discussion, it is sufficient to know that there are theoretical reasons why 
wavelets are an excellent expansion system for a wide set of signal processing problems. 
Being an unconditional basis also sets the stage for efficient and effective nonlinear 
processing of the wavelet transform of a signal for compression, denoising, and detection 


which are discussed in Chapter: The Scaling Function and Scaling Coefficients, Wavelet and 
Wavelet Coefficients. 


The Scaling Function and Scaling Coefficients, Wavelet and Wavelet Coefficients 


We will now look more closely at the basic scaling function and wavelet to see when they 
exist and what their properties are [link], [link], [link], [link], [link], [link], [link]. Using 
the same approach that is used in the theory of differential equations, we will examine the 
properties of y(t) by considering the equation of which it is a solution. The basic 
recursion [link] that comes from the multiresolution formulation is 

Equation: 


=e ) V2y (2t — n) 


with h(n) being the scaling coefficients and y(t) being the scaling function which 
satisfies this equation which is sometimes called the refinement equation, the dilation 
equation, or the multiresolution analysis equation (MRA). 


In order to state the properties accurately, some care has to be taken in specifying just 
what classes of functions are being considered or are allowed. We will attempt to walk a 
fine line to present enough detail to be correct but not so much as to obscure the main 
ideas and results. A few of these ideas were presented in Section: Signal Spaces and a few 
more will be given in the next section. A more complete discussion can be found in 

[link], in the introductions to [link], [link], [link], or in any book on function analysis. 


Tools and Definitions 


Signal Classes 


There are three classes of signals that we will be using. The most basic is called L? (R) 
which contains all functions which have a finite, well-defined integral of the square: 
feL? => fife )|? dt = E < oo. This class is important because it is a 
generalization of normal Euclidean geometry and because it gives a simple representation 
of the energy in a signal. 


The next most basic class is L1 (R), which requires a finite integral of the absolute value 
of the function: f€ L' = ff (t)|dt = K < oo. This class is important because 
one may interchange infinite summations and integrations with these functions although 
not necessarily with L? functions. These classes of function spaces can be generalized to 
those with f |f (¢)|’ dt = K < oo and designated L?. 


A more general class of signals than any L? space contains what are called distributions. 
These are generalized functions which are not defined by their having “values" but by the 
value of an “inner product" with a normal function. An example of a distribution would 


be the Dirac delta function 6(t) where it is defined by the property: 


f(T) = f f(t) 6 -T) dt. 


Another detail to keep in mind is that the integrals used in these definitions are Lebesque 
integrals which are somewhat more general than the basic Riemann integral. The value of 
a Lebesque integral is not affected by values of the function over any countable set of 
values of its argument (or, more generally, a set of measure zero). A function defined as 
one on the rationals and zero on the irrationals would have a zero Lebesque integral. As a 
result of this, properties derived using measure theory and Lebesque integrals are 
sometime said to be true “almost everywhere," meaning they may not be true over a set of 
measure zero. 


Many of these ideas of function spaces, distributions, Lebesque measure, etc. came out of 
the early study of Fourier series and transforms. It is interesting that they are also 
important in the theory of wavelets. As with Fourier theory, one can often ignore the 
signal space classes and can use distributions as if they were functions, but there are some 
cases where these ideas are crucial. For an introductory reading of this book or of the 
literature, one can usually skip over the signal space designation or assume Riemann 
integrals. However, when a contradiction or paradox seems to arise, its resolution will 
probably require these details. 


Fourier Transforms 


We will need the Fourier transform of y(t) which, if it exists, is defined to be ® 
Equation: 


(i) — i ” y(t) e-™ at 


Co 


and the discrete-time Fourier transform (DTFT) [link] of h(n) defined to be 
Equation: 


where i = »/—1 and nis an integer (n € Z). If convolution with h(n) is viewed as a 
digital filter, as defined in Section: Analysis - From Fine Scale to Coarse Scale, then the 
DTFT of h(n) is the filter's frequency response, [link], [link] which is 27 periodic. 


If &(w) exists, the defining recursive equation [link] becomes 


Equation: 


which after iteration becomes 
Equation: 


& (w) = I" (sz) \s (0). 


k=1 


if 37, h(n) = V2 and ®(0) is well defined. This may be a distribution or it may be a 
smooth function depending on H(w) and, therefore, h(n)[link], [link]. This makes sense 
only if ®(0) is well defined. Although [link] and [link] are equivalent term-by-term, the 
requirement of (0) being well defined and the nature of the limits in the appropriate 
function spaces may make one preferable over the other. Notice how the zeros of H(w) 
determine the zeros of ®(w). 


Refinement and Transition Matrices 


There are two matrices that are particularly important to determining the properties of 
wavelet systems. The first is the refinement matrixM, which is obtained from the basic 
recursion equation [link] by evaluating y(¢) at integers [link], [link], [link], [link], [link]. 
This looks like a convolution matrix with the even (or odd) rows removed. Two particular 
submatrices that are used later in [link] to evaluate y(t) on the dyadic rationals are 
illustrated for NV = 6 by 


Equation: 

ho 0 0 0 0 0 Yo Yo 
ho hy ho 0 0 0 Y1 Y1 

ha hz ho hy ho O 
V2 4 3 2 1 0 2 = p2 
0 hs ha hg ho hy| | o3 ~3 
0 0 O hs ha hz] | Qa YA 
0 0 0 0 O hs} | ys P5 


which we write in matrix form as 
Equation: 


with Mo being the 6 x 6 matrix of the h(n) and ¢ being 6 x 1 vectors of integer 
samples of y(t). In other words, the vector y with entries y(k) is the eigenvector of My 


for an eigenvalue of unity. 


The second submatrix is a shifted version illustrated by 


Equation: 

hy ho 0 0 0 0 Yo 1/2 
h3 hy hy ho 0 0 Y1 P3/2 
hs hg hg ho hy h 

fa BE Ne My og 2} _ 5/2 
0 O hs ha hg he} | $3 P7/2 
0 0 0 0 hs hy Ya 9/2 
0 0 0 0 0 OO} |g, Y11/2 


with the matrix being denoted M. The general refinement matrix M is the infinite 
matrix of which Mog and M, are partitions. If the matrix H is the convolution matrix for 
h(n), we can denote the M matrix by || 2]H to indicate the down-sampled convolution 
matrix H. Clearly, for y(t) to be defined on the dyadic rationals, Mo must have a unity 
eigenvalue. 


A third, less obvious but perhaps more important, matrix is called the transition matrix'T 
and it is built up from the autocorrelation matrix of h(n). The transition matrix is 
constructed by 

Equation: 


T = [| 2)/HH". 


This matrix (sometimes called the Lawton matrix) was used by Lawton (who originally 
called it the Wavelet-Galerkin matrix) [link] to derive necessary and sufficient conditions 
for an orthogonal wavelet basis. As we will see later in this chapter, its eigenvalues are 
also important in determining the properties of y(t) and the associated wavelet system. 


Necessary Conditions 


Theorem 1 If y(t) € L’ is a solution to the basic recursion equation [link] and if 


Jf y(t) dt £0, then 
Equation: 


S h(n) = V2. 


The proof of this theorem requires only an interchange in the order of a summation and 
integration (allowed in L') but no assumption of orthogonality of the basis functions or 
any other properties of y(t) other than a nonzero integral. The proof of this theorem and 
several of the others stated here are contained in Appendix A. 


This theorem shows that, unlike linear constant coefficient differential equations, not just 
any set of coefficients will support a solution. The coefficients must satisfy the linear 
equation [link]. This is the weakest condition on the h(n). 


Theorem 2 If y(t) is an L! solution to the basic recursion equation [link] with 
f(t) dt = 1, and 
Equation: 


with ®(a + 27k) # 0 for some k, then 
Equation: 
S> h(2n) = S$ A(2n+1) 


where [link] may have to be a distributional sum. Conversely, if [link] is satisfied, then 
[link] is true. 


Equation [link] is called the fundamental condition, and it is weaker than requiring 
orthogonality but stronger than [link]. It is simply a result of requiring the equations 
resulting from evaluating [link] on the integers be consistent. Equation [link] is called a 
partitioning of unity (or the Strang condition or the Shoenberg condition). 


A similar theorem by Cavaretta, Dahman and Micchelli [link] and by Jia [link] states that 
if p € L? and the integer translates of y(t) form a Riesz basis for the space they span, 
then 50, A(2n) = >), A (2n+4+ 1). 


Theorem 3 If 9(t) is an L? \ L? solution to [link] and if integer translates of y(t) are 
orthogonal as defined by 
Equation: 


E ifk=0 
0 otherwise, 


[emot-wa = E6(k) = { 


then 
Equation: 


Yo h(n) h(n — 2k) = 5(k) = i ifk =0 


0 otherwise, 


Notice that this does not depend on a particular normalization of y(t). 


If y(t) is normalized by dividing by the square root of its energy VE, then integer 
translates of y(t) are orthonormal defined by 
Equation: 


1 ifk=0 
0 otherwise, 


[e@ee-ma=5@ ={ 


This theorem shows that in order for the solutions of [link] to be orthogonal under integer 
translation, it is necessary that the coefficients of the recursive equation be orthogonal 
themselves after decimating or downsampling by two. If y(t) and/or h(n) are complex 
functions, complex conjugation must be used in [link], [link], and [link]. 


Coefficients h(n) that satisfy [link] are called a quadrature mirror filter (QMF) or 
conjugate mirror filter (CMF), and the condition [link] is called the quadratic condition 
for obvious reasons. 


Corollary 1 Under the assumptions of Theorem [link], the norm of h(n) is automatically 


unity. 


Equation: 


S> |h(n)/? =1 


Not only must the sum of h(n) equal /2, but for orthogonality of the solution, the sum 
of the squares of h(n) must be one, both independent of any normalization of y(t). This 


unity normalization of h(n) is the result of the 2 term in [link]. 


Corollary 2 Under the assumptions of Theorem [link], 
Equation: 


7 A@n) = Sy AGn+1) = _ 


This result is derived in the Appendix by showing that not only must the sum of h(n) 
equal J 5. but for orthogonality of the solution, the individual sums of the even and odd 
terms in h(n) must be 1/+/2, independent of any normalization of y(t). Although stated 
here as necessary for orthogonality, the results hold under weaker non-orthogonal 
conditions as is stated in Theorem [link]. 


Theorem 4 If p(t) has compact support on 0 < t < N — Land if y(t — k) are linearly 
independent, then h(n) also has compact support over0 <n < N —1: 
Equation: 


h(n) =0 for n<Oandn>N-1 


Thus NN is the length of the h(n) sequence. 


If the translates are not independent (or some equivalent restriction), one can have h(n) 
with infinite support while y(t) has finite support [link]. 


These theorems state that if p(t) has compact support and is orthogonal over integer 
translates, X’ bilinear or quadratic equations [link] must be satisfied in addition to the one 
linear equation [link]. The support or length of h(n) is N, which must be an even 
number. The number of degrees of freedom in choosing these N coefficients is then 

— — 1. This freedom will be used in the design of a wavelet system developed 
in Chapter: Regularity, Moments, and Wavelet System Design and elsewhere. 


Frequency Domain Necessary Conditions 


We turn next to frequency domain versions of the necessary conditions for the existence 
of y(t). Some care must be taken in specifying the space of functions that the Fourier 
transform operates on and the space that the transform resides in. We do not go into those 
details in this book but the reader can consult [link]. 


Theorem 5 /f y(t) is a L' solution of the basic recursion equation [link], then the 
following equivalent conditions must be true: 
Equation: 


So h(n) = H(0) = v2 


This follows directly from [link] and states that the basic existence requirement [link] is 
equivalent to requiring that the FIR filter's frequency response at DC (w = 0) be V2. 


Theorem 6 For h(n) € £', then 
Equation: 


S~h(2n) = S 7h (2n +1) if and onlyif H(x) =0 


which says the frequency response of the FIR filter with impulse response h(7) is zero at 
the so-called Nyquist frequency (w = 7). This follows from [link] and [link], and 
supports the fact that h(7) is a lowpass digital filter. This is also equivalent to the M and 
T matrices having a unity eigenvalue. 


Theorem 7 If (p(t) is a solution to [link] in L? Q L* and ®(w) is a solution of [link] such 
that ®(0) F 0, then 
Equation: 


[emee-wa = 6(k) if andonlyif N° |® (w+ 2n0)|? =1 
2 


This is a frequency domain equivalent to the time domain definition of orthogonality of 
the scaling function [link], [link], [link]. It allows applying the orthonormal conditions to 
frequency domain arguments. It also gives insight into just what time domain 
orthogonality requires in the frequency domain. 


Theorem 8 For any h(n) € £1, 
Equation: 


S" h(n) h(n — 2k) = 6(k) ifandonlyif |H(w)|? + |H(w+7)|? = 2 


This theorem [link], [link], [link] gives equivalent time and frequency domain conditions 
on the scaling coefficients and states that the orthogonality requirement [link] is 
equivalent to the FIR filter with h(n) as coefficients being what is called a Quadrature 
Mirror Filter (QMF) [link]. Note that [link], [link], and [link] require |H(2/2)| = 1 and 
that the filter is a “half band" filter. 


Sufficient Conditions 


The above are necessary conditions for y(t) to exist and the following are sufficient. 
There are many forms these could and do take but we present the following as examples 
and give references for more detail [link], [link], [link], [link], [link], [link], [link], [link], 
[link], [link]. 


Theorem 9 If >, h(n) = V2 and h(n) has finite support or decays fast enough so 
that S> |h (n)|(1 + |n|)© < 00 for some € > 0, then a unique (within a scalar multiple) 
y(t) (perhaps a distribution) exists that satisfies [link] and whose distributional Fourier 
transform satisfies [link]. 


This [link], [link], [link] can be obtained in the frequency domain by considering the 
convergence of [link]. It has recently been obtained using a much more powerful 
approach in the time domain by Lawton [link]. 


Because this theorem uses the weakest possible condition, the results are weak. The 
scaling function obtained from only requiring )>,, h (n) = v2 may be so poorly behaved 
as to be impossible to calculate or use. The worst cases will not support a multiresolution 
analysis or provide a useful expansion system. 


Theorem 10 If S>,, h (2n) = Sx, h(2n +1) = 1/V2 and h(n) has finite support or 
decays fast enough so that )* |h (n)|(1 + |n|)° < oo for some € > 0, then a v(t) 
(perhaps a distribution) that satisfies [link] exists, is unique, and is well-defined on the 
dyadic rationals. In addition, the distributional sum 


Equation: 
So y(t-k) =1 
k 


holds. 


This condition, called the fundamental condition [link], [link], gives a slightly tighter 
result than Theorem [link]. While the scaling function still may be a distribution not in oe 
or L”, it is better behaved than required by Theorem [link] in being defined on the dense 
set of dyadic rationals. This theorem is equivalent to requiring H(7) = 0 which from the 
product formula [link] gives a better behaved ®(w). It also guarantees a unity eigenvalue 
for M and T but not that other eigenvalues do not exist with magnitudes larger than one. 


The next several theorems use the transition matrix T defined in [link] which is a down- 
sampled autocorrelation matrix. 


Theorem 11 If the transition matrix 'T has eigenvalues on or in the unit circle of the 
complex plane and if any on the unit circle are multiple, they have a complete set of 
eigenvectors, then y (t) € L?. 


If T’ has unity magnitude eigenvalues, the successive approximation algorithm (cascade 
algorithm) [link] converges weakly to y (t) € L? [link]. 


Theorem 12 If the transition matrix T has a simple unity eigenvalue with all other 
eigenvalues having magnitude less than one, then y (t) € L?. 


Here the successive approximation algorithm (cascade algorithm) converges strongly to 
y (t) € L?. This is developed in [link]. 


If in addition to requiring [link], we require the quadratic coefficient conditions [link], a 
tighter result occurs which gives y (t) € L? (R) and a multiresolution tight frame 
system. 


Theorem 13 (Lawton) If h(n) has finite support or decays fast enough and if 
S-,, h(n) = V2 and if Sy, h(n) h(n — 2k) = 6(k), then y(t) € L? (R) exists, and 
generates a wavelet system that is a tight frame in L*. 


This important result from Lawton [link], [link] gives the sufficient conditions for y(t) to 
exist and generate wavelet tight frames. The proof uses an iteration of the basic recursion 
equation [link] as a successive approximation similar to Picard's method for differential 
equations. Indeed, this method is used to calculate y(t) in [link]. It is interesting to note 
that the scaling function may be very rough, even “fractal" in nature. This may be 
desirable if the signal being analyzed is also rough. 


Although this theorem guarantees that y(t) generates a tight frame, in most practical 
situations, the resulting system is an orthonormal basis [link]. The conditions in the 
following theorems are generally satisfied. 


Theorem 14 (Lawton) If h(n) has compact support, >>, h(n) = /2, and 
>>, 2(n) h(n — 2k) = 6(k), then y(t — k) forms an orthogonal set if and only if the 
transition matrix T' has a simple unity eigenvalue. 


This powerful result allows a simple evaluation of h(n) to see if it can support a wavelet 
expansion system [link], [link], [link]. An equivalent result using the frequency response 
of the FIR digital filter formed from h(n) was given by Cohen. 


Theorem 15 (Cohen) If H(w) is the DTFT of h(n) with compact support and 

S~,, h(n) = V2 with >, h(n) h(n — 2k) = 6(k),and if H(w) 4 0 for 

—1/3 <w < 7/3, then the y(t — k) satisfying [link] generate an orthonormal basis in 
L?. 


A slightly weaker version of this frequency domain sufficient condition is easier to prove 
[link], [link] and to extend to the M-band case for the case of no zeros allowed in 


—m/2<w < 7m/2[link]. There are other sufficient conditions that, together with those in 
Theorem [link], will guarantee an orthonormal basis. Daubechies' vanishing moments 
will guarantee an orthogonal basis. 


Theorems [link], [link], and [link] show that h(n) has the characteristics of a lowpass FIR 
digital filter. We will later see that the FIR filter made up of the wavelet coefficients is a 
high pass filter and the filter bank view developed in Chapter: Filter Banks and the 
Discrete Wavelet Transform and Section: Multiplicity-M (M-Band) Scaling Functions 
and Wavelets further explains this view. 


Theorem 16 If h(n) has finite support and if y (t) € L*, then y(t) has finite support 
[link]. 


If y(t) is not restricted to L', it may have infinite support even if h(n) has finite support. 


These theorems give a good picture of the relationship between the recursive equation 
coefficients h(n) and the scaling function y(t) as a solution of [link]. More properties 
and characteristics are presented in [link]. 


Wavelet System Design 


One of the main purposes for presenting the rather theoretical results of this chapter is to 
set up the conditions for designing wavelet systems. One approach is to require the 
minimum sufficient conditions as constraints in an optimization or approximation, then 
use the remaining degrees of freedom to choose h(n) that will give the best signal 
representation, decomposition, or compression. In some cases, the sufficient conditions 
are overly restrictive and it is worthwhile to use the necessary conditions and then check 
the design to see if it is satisfactory. In many cases, wavelet systems are designed by a 
frequency domain design of H(w) using digital filter design techniques with wavelet 
based constraints. 


The Wavelet 


Although this chapter is primarily about the scaling function, some basic wavelet 
properties are included here. 


Theorem 17 If the scaling coefficients h(n) satisfy the conditions for existence and 
orthogonality of the scaling function and the wavelet is defined by [link], then the integer 
translates of this wavelet span Wo, the orthogonal compliment of %o, both being in Vy, 
i.e., the wavelet is orthogonal to the scaling function at the same scale, 

Equation: 


if and only if the coefficients hy (n) are given by 
Equation: 


where N is an arbitrary odd integer chosen to conveniently position hy (n). 
An outline proof is in Appendix A. 


Theorem 18 If the scaling coefficients h(n) satisfy the conditions for existence and 
orthogonality of the scaling function and the wavelet is defined by [link], then the integer 
translates of this wavelet span Wo, the orthogonal compliment of ¥o, both being in V; 
i.e., the wavelet is orthogonal to the scaling function at the same scale. If 


Equation: 
[e(t-n) ¥(e-m) dt =0 


then 
Equation: 


S h(n) hi (n — 2k) = 0 


which is derived in Appendix A, [link]. 
The translation orthogonality and scaling function-wavelet orthogonality conditions in 


[link] and [link] can be combined to give 
Equation: 


So he (n) hm (n — 2k) = 6(k) 5 (€—m) 


if ho (n) is defined as h(n). 


Theorem 19 If h(n) satisfies the linear and quadratic admissibility conditions of [link] 
and [link], then 


Equation: 


Equation: 

|H, (w)| = |H(w+7)], 
Equation: 

|H (w)? + | (w)|" = 2, 
and 
Equation: 


Jo(i)a-s 


The wavelet is usually scaled so that its norm is unity. 


The results in this section have not included the effects of integer shifts of the scaling 
function or wavelet coefficients h(n) or hy (7). Ina particular situation, these sequences 
may be shifted to make the corresponding FIR filter causal. 


Alternate Normalizations 


An alternate normalization of the scaling coefficients is used by some authors. In some 
ways, it is a cleaner form than that used here, but it does not state the basic recursion as a 
normalized expansion, and it does not result in a unity norm for h(n). The alternate 
normalization uses the basic multiresolution recursive equation with no V2 

Equation: 


p(t) = > h(n) p (2t — 2). 


Some of the relationships and results using this normalization are: 
Equation: 


Vin h(n) =2 

Yn lh (n)|? = 2 

Sy, h(n) h(h — 2k) = 26(k) 
h(2n) = 32, h(Q2n+1)=1 
) 


A still different normalization occasionally used has a factor of 2 in [link] rather than /2 
or unity, giving }),, h(n) = 1. Other obvious modifications of the results in other places 
in this book can be worked out. Take care in using scaling coefficients h(n) from the 


literature as some must be multiplied or divided by ¥/2 to be consistent with this book. 


Example Scaling Functions and Wavelets 


Several of the modern wavelets had never been seen or described before the 1980's. This 
section looks at some of the most common wavelet systems. 


Haar Wavelets 


The oldest and most basic of the wavelet systems that has most of our desired properties 
is constructed from the Haar basis functions. If one chooses a length NV = 2 scaling 
coefficient set, after satisfying the necessary conditions in [link] and [link], there are no 
remaining degrees of freedom. The unique (within normalization) coefficients are 
Equation: 


and the resulting normalized scaling function is 
Equation: 
1 for0O<t<1 
y(t) = 
0 otherwise. 


The wavelet is, therefore, 
Equation: 


1 for0<t<1/2 
w(t)=¢-1 for1/2<t<1 
0 otherwise. 


Their satisfying the multiresolution equation [link] is illustrated in Figure: Haar and 
Triangle Scaling Functions. Haar showed that translates and scalings of these functions 
form an orthonormal basis for L? (R). We can easily see that the Haar functions are also 
a compact support orthonormal wavelet system that satisfy Daubechies' conditions [link]. 
Although they are as regular as can be achieved for N = 2, they are not even continuous. 
The orthogonality and nesting of spanned subspaces are easily seen because the translates 
have no overlap in the time domain. It is instructive to apply the various properties 

of [link] and [link] to these functions and see how they are satisfied. They are illustrated 
in the example in Figure: Haar Scaling Functions and Wavelets that Span V; through 
Figure: Haar Function Approximation in V;. 


Sinc Wavelets 


The next best known (perhaps the best known) basis set is that formed by the sinc 
functions. The sinc functions are usually presented in the context of the Shannon 
sampling theorem, but we can look at translates of the sinc function as an orthonormal set 
of basis functions (or, in some cases, a tight frame). They, likewise, usually form a 
orthonormal wavelet system satisfying the various required conditions of a 
multiresolution system. 


The sinc function is defined as 
Equation: 


sinc (f) 


where sinc(0) = 1. This is a very versatile and useful function because its Fourier 
transform is a simple rectangle function and the Fourier transform of a rectangle function 
is a sinc function. In order to be a scaling function, the sinc must satisfy [link] as 
Equation: 


sinc (Kt) = S- h(n) sinc (K2t — Kn) 


for the appropriate scaling coefficients h(n) and some K. If we construct the scaling 
function from the generalized sampling function as presented in [link], the sinc function 


becomes 
Equation: 


= mela fo: 
sinc(Kt) = a sinc (KTn) sinc (srt an): 


In order for these two equations to be true, the sampling period must be T’ = 1/2 and the 
parameter 
Equation: 


ae 
R 


which gives the scaling coefficients as 
Equation: 


h(n) = sinc (sn): 


We see that y(t) = sinc(K‘t) is a scaling function with infinite support and its 
corresponding scaling coefficients are samples of a sinc function. If R = 1, then K = 7 
and the scaling function generates an orthogonal wavelet system. For R > 1, the wavelet 
system is a tight frame, the expansion set is not orthogonal or a basis, and R is the 
amount of redundancy in the system as discussed in this chapter. For the orthogonal sinc 
scaling function, the wavelet is simply expressed by 

Equation: 


P(t) = 2 v(2t) — v(t). 


The sinc scaling function and wavelet do not have compact support, but they do illustrate 
an infinitely differentiable set of functions that result from an infinitely long h(n). The 
orthogonality and multiresolution characteristics of the orthogonal sinc basis is best seen 
in the frequency domain where there is no overlap of the spectra. Indeed, the Haar and 
sinc systems are Fourier duals of each other. The sinc generating scaling function and 
wavelet are shown in [link]. 


Sinc Scaling Function 
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Sinc Wavelet 
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Sinc Scaling Function and Wavelet 


Spline and Battle-Lemarié Wavelet Systems 


The triangle scaling function illustrated in Figure: Haar and Triangle Scaling Functions is 
a special case of a more general family of spline scaling functions. The scaling coefficient 
‘. ti 
2/2” «/2? 2/2 2: 
triangle scaling function. This function is a first-order spline, being a concatenation of 
two first order polynomials to be continuous at the junctions or “knots". A quadratic 
spline is generated from h = {1/4, 3/4, 3/4, 1/4}/+/2 as three sections of second order 
polynomials connected to give continuous first order derivatives at the junctions. The 
cubic spline is generated from h (n) = {1/16, 1/4,3/8, 1/4, 1/16}//2. This is 
generalized to an arbitrary Nth order spline with continuous (JV — 1)th order derivatives 
and with compact support of NV + 1. These functions have excellent mathematical 
properties, but they are not orthogonal over integer translation. If orthogonalized, their 
support becomes infinite (but rapidly decaying) and they generate the “Battle-Lemarié 
wavelet system" [link], [link], [link], [link]. [link] illustrates the first-order spline scaling 


system h(n) = { o} gives rise to the piecewise linear, continuous 


function which is the triangle function along with the second-, third-, and fourth-order 
spline scaling functions. 
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(a) @s, 1st order spline 


(c) @s3 3rd order spline (d) @s4 4th order spline 


Spline Scaling Functions 


Further Properties of the Scaling Function and Wavelet 


The scaling function and wavelet have some remarkable properties that should be 
examined in order to understand wavelet analysis and to gain some intuition for these 
systems. Likewise, the scaling and wavelet coefficients have important properties that 
should be considered. 


We now look further at the properties of the scaling function and the wavelet in terms of 
the basic defining equations and restrictions. We also consider the relationship of the 
scaling function and wavelet to the equation coefficients. A multiplicity or rank of two is 
used here but the more general multiplicity-M case is easily derived from these (See 


Derivations or proofs for some of these properties are included in Appendix B. 


The basic recursive equation for the scaling function, defined in [link] as 
Equation: 


ap a ) V2y(2t—n), 


is homogeneous, so its solution is unique only within a normalization factor. In most 
cases, both the scaling function and wavelet are normalized to unit energy or unit norm. 
In the properties discussed here, we normalize the energy as # = f ly (t) r di ==4, 
Other normalizations can easily be used if desired. 


General Properties not Requiring Orthogonality 


There are several properties that are simply a result of the multiresolution equation [link] 
and, therefore, hold for orthogonal and biorthogonal systems. 


Property 1 The normalization of p(t) is arbitrary and is given in [link] as E. Here we 
usually set F = 1 so that the basis functions are orthonormal and coefficients can easily 
be calculated with inner products. 


Equation: 
5 
[\ew 


Property 2 Not only can the scaling function be written as a weighted sum of functions in 
the next higher scale space as stated in the basic recursion equation [link], but it can also 
be expressed in higher resolution spaces: 

Equation: 


=> hn’) (n ) 29/? y (2°¢ = n) 


where h! (n) = h(n) and for j >1 
Equation: 


pr) (n) = S- A) (k) pd) (n — 2k). 


k 


Property 3 A formula for the sum of dyadic samples of y(t) 
Equation: 


k vi 
Yv(Gr) =? 
k 27 
Property 4 A “partition of unity" follows from [link] for J = 0 


Equation: 
S>y(m) =1 


Property 5 A generalized partition of unity exists if y(t) is continuous 
Equation: 


So y(t-—m) =1 


Property 6 A frequency domain statement of the basic recursion equation [link] 
Equation: 


® (w) = 7 FH (w/2) ® (w/2) 


Property 7 Successive approximations in the frequency domain is often easier to analyze 
than the time domain version in [link]. The convergence properties of this infinite product 
are very important. 

Equation: 


This formula is derived in [link]. 


Properties that Depend on Orthogonality 


The following properties depend on the orthogonality of the scaling and wavelet 
functions. 


Property 8 The square of the integral of y(t) is equal to the integral of the square of 
y(t), or A? = E. 


Equation: 


Jol) fore 


Property 9 The integral of the wavelet is necessarily zero 


Equation: 
[ole ao 


The norm of the wavelet is usually normalized to one such that f |x (t) |? dt =1, 
Property 10 Not only are integer translates of the wavelet orthogonal; different scales 


are also orthogonal. 
Equation: 


[ve (2/t — k) 2°/2ep (2't — £) dt = 5(k— £)5(j -3) 


where the norm of w(t) is one. 


Property 11 The scaling function and wavelet are orthogonal over both scale and 
translation. 
Equation: 


/ 24/2 (27 — k) 24/y (2't — £) dt =0 


for all integer i, 3, k, 2 where j < i. 


Property 12 A frequency domain statement of the orthogonality requirements in [link] . It 
also is a statement of equivalent energy measures in the time and frequency domains as in 
Parseval's theorem, which is true with an orthogonal basis set. 

Equation: 


Siew rant)? = fe (4) ‘do= fip@Pae=1 


Property 13 The scaling coefficients can be calculated from the orthogonal or tight 
frame scaling functions by 
Equation: 


k(n) = V2 | o(t)e(2t nat. 


Property 14 The wavelet coefficients can be calculated from the orthogonal or tight 
frame scaling functions by 
Equation: 


nGa= v2 f w(t) p(2t—m) dt. 


Derivations of some of these properties can be found in Appendix B. Properties in 
equations [link], [link], [link], [link], [link], [link], and [link] are independent of any 
normalization of y(t). Normalization affects the others. Those in equations [link], [link], 
[link], [link], [link], [link], and [link] do not require orthogonality of integer translates of 
y(t). Those in [link], [link], [link], [link], [link], [link], [link] require orthogonality. No 
properties require compact support. Many of the derivations interchange order of 
summations or of summation and integration. Conditions for those interchanges must be 
met. 


Parameterization of the Scaling Coefficients 


The case where y(t) and h(n) have compact support is very important. It aids in the time 
localization properties of the DWT and often reduces the computational requirements of 
calculating the DWT. If h(n) has compact support, then the filters described in Chapter: 
Filter Banks and the Discrete Wavelet Transform are simple FIR filters. We have stated 
that NV, the length of the sequence h(n), must be even and h(n) must satisfy the linear 
constraint of [link] and the x bilinear constraints of [link]. This leaves # — 1 degrees of 
freedom in choosing h(n) that will still guarantee the existence of y(t) and a set of 


essentially orthogonal basis functions generated from y(t). 


Length-2 Scaling Coefficient Vector 


For a length-2 h(n), there are no degrees of freedom left after satisfying the required 
conditions in [link] and [link]. These requirements are 
Equation: 


h(0) + h(1) = V2 
and 
Equation: 

h?(0)+h?(1) = 1 


which are uniquely satisfied by 
Equation: 


ho» = {h(0),h(1)} = iz =}. 


These are the Haar scaling functions coefficients which are also the length-2 Daubechies 
coefficients [link] used as an example in Chapter: A multiresolution formulation of 
Wavelet Systems and discussed later in this book. 


Length-4 Scaling Coefficient Vector 


For the length-4 coefficient sequence, there is one degree of freedom or one parameter 
that gives all the coefficients that satisfy the required conditions: 


Equation: 

h(0) +h(1) +h(2) +h(3) = V2, 
Equation: 

h? (0) +h? (1) +h? (2) +h? (3) =1 
and 
Equation: 


Letting the parameter be the angle a, the coefficients become 
Equation: 


h(0) = (1— cos (a)+ sin (a))/ (2v2) 
h(1) = (1+ cos (a)+ sin (a))/ (2V2) 
h(2) = (1+ cos (a)— sin (a))/ (2v2) 
h(3) = (1— cos (a)— sin (a))/ (2v2). 


These equations also give the length-2 Haar coefficients [link] for a = 0, 7/2, 37/2 and 
a degenerate condition for ~@ = 7. We get the Daubechies coefficients (discussed later in 
this book) for a = 7/3. These Daubechies-4 coefficients have a particularly clean form, 
Equation: 


; = gaye 3=4/3 | 
- 4/2 ° 4/2” 4/2 ° 472 


Length-6 Scaling Coefficient Vector 


For a length-6 coefficient sequence h(n), the two parameters are defined as a and 6 and 
the resulting coefficients are 


Equation: 

h(0) = [(1+ cos (a)+ sin (a)) (1— cos (8) sin (8)) + 2 sin (8) cos (a)]/ (4v2) 
h(1) = [(1— cos (a)+ sin (a)) (1+ cos ()— sin (8)) — 2 sin (8) cos (a)]/ (4v2) 
h(2) = [1+ cos (a — 8)+ sin (a — B)]/ (2v2) 

h(3) = [14+ cos (a — 8)— sin (a — B)]/ (2v2) 

h(4) = 1/72 —h(0) —h(2) 

h(5) = 1/72 —h(1) —h(3) 


Here the Haar coefficients are generated for any a = £ and the length-4 coefficients 
[link] result if 6 = 0 with a being the free parameter. The length-4 Daubechies 
coefficients are calculated for a = 7/3 and 6 = 0. The length-6 Daubechies coefficients 
result from a = 1. 35980373244182 and 6 = —0.78210638474440. 


The inverse of these formulas which will give a and 6 from the allowed h(n) are 
Equation: 


2 (n(0)? + h(1)”) ~1+(h(2) +h(3))/V2 


qa =arctan 2 (h(1)h(2) —h(0) h(3)) + V2(h(0) —h(1)) 
Equation: 
8 = a— arctan Pearce 
h (2) + h(3) —1/v2 


As a and £ range over —7 to 7 all possible h(n) are generated. This allows informative 
experimentation to better see what these compactly supported wavelets look like. This 
parameterization is implemented in the Matlab programs in Appendix C and in the 
Aware, Inc. software, UltraWave [link]. 


Since the scaling functions and wavelets are used with integer translations, the location of 
their support is not important, only the size of the support. Some authors shift h(n), 

hy (n), y(t), and b(t) to be approximately centered around the origin. This is achieved 
by having the initial nonzero scaling coefficient start atn = — x + 1 rather than zero. 
We prefer to have the origin atn = t = 0. 


Matlab programs that calculate h(n) for N = 2, 4, 6 are furnished in Appendix C. They 
calculate h(n) from a and £ according to [link], [link], and [link]. They also work 
backwards to calculate a and £ from allowable h(n) using [link]. A program is also 
included that calculates the Daubechies coefficients for any length using the spectral 
factorization techniques in [link] and Chapter: Regularity, Moments, and Wavelet System 
Design of this book. 


Longer h(n) sequences are more difficult to parameterize but can be done with the 
techniques of Pollen [link] and Wells [link] or the lattice factorization by Vaidyanathan 
[link] developed in Chapter: Filter Banks and Transmultiplexers. Selesnick derived 
explicit formulas for N = 8 using the symbolic software system, Maple, and set up the 
formulation for longer lengths [link]. It is over the space of these independent parameters 
that one can find optimal wavelets for a particular problem or class of signals [link], 
[link]. 


Calculating the Basic Scaling Function and Wavelet 


Although one never explicitly uses the scaling function or wavelet (one uses the scaling 
and wavelet coefficients) in most practical applications, it is enlightening to consider 
methods to calculate y(t) and ~(t). There are two approaches that we will discuss. The 
first is a form of successive approximations that is used theoretically to prove existence 
and uniqueness of y(t) and can also be used to actually calculate them. This can be done 
in the time domain to find y(t) or in the frequency domain to find the Fourier transform 
of y(t) which is denoted ®(w). The second method solves for the exact values of y(t) on 
the integers by solving a set of simultaneous equations. From these values, it is possible 
to then exactly calculate values at the half integers, then at the quarter integers and so on, 
giving values of y(t) on what are called the dyadic rationals. 


Successive Approximations or the Cascade Algorithm 


In order to solve the basic recursion equation [link], we propose an iterative algorithm 
that will generate successive approximations to y(t). If the algorithm converges to a 
fixed point, then that fixed point is a solution to [link]. The iterations are defined by 
Equation: 


pt) = SU h(n) V29™ (2¢ —n) 


n=0 


for the k*” iteration where an initial p°) (t) must be given. Because this can be viewed as 
applying the same operation over and over to the output of the previous application, it is 
sometimes called the cascade algorithm. 


Using definitions [link] and [link], the frequency domain form becomes 
Equation: 


A(z) #°(3) 


and the limit can be written as an infinite product in the form 
Equation: 


 (k+1) (w) = 


k=1 


BI) (w) = TI 9(=)]| el) (0), 


If this limit exists, the Fourier transform of the scaling function is 


Equation: 


The limit does not depend on the shape of the initial yp) (t), but only on 
& ) (0) = f y™ (t) dt = Apo, which is invariant over the iterations. This only makes 
sense if the limit of ®(w) is well-defined as when it is continuous at w = 0. 


The Matlab program in Appendix C implements the algorithm in [link] which converges 
reliably to y(t), even when it is very discontinuous. From this scaling function, the 
wavelet can be generated from [link]. It is interesting to try this algorithm, plotting the 
function at each iteration, on both admissible h(n) that satisfy [link] and [link] and on 
inadmissible h(n). The calculation of a scaling function for N = 4 is shown at each 
iteration in [link]. 


Because of the iterative form of this algorithm, applying the same process over and over, 
it is sometimes called the cascade algorithm [link], [link]. 


Iterating the Filter Bank 


An interesting method for calculating the scaling function also uses an iterative procedure 
which consists of the stages of the filter structure of Chapter: Filter Banks and the 
Discrete Wavelet Transform which calculates wavelet expansions coefficients (DWT 
values) at one scale from those at another. A scaling function, wavelet expansion of a 
scaling function itself would be a single nonzero coefficient at the scale of 7 = 1. Passing 
this single coefficient through the synthesis filter structure of Figure: Two-Stage Two- 
Band Synthesis Tree and [link] would result in a fine scale output that for large 7 would 
essentially be samples of the scaling function. 


Successive approximation in the frequency domain 


The Fourier transform of the scaling function defined in [link] is an important tool for 
studying and developing wavelet theory. It could be approximately calculated by taking 
the DFT of the samples of y(t) but a more direct approach is available using the infinite 
product in [link]. From this formulation we can see how the zeros of H(w) determine the 
zeros of ®(w). The existence conditions in Theorem 5 require H(z) = 0 or, more 
generally, H(w) = 0 for w = (2k + 1)z. Equation [link] gives the relation of these zeros 


of H(w) to the zeros of ®(w). For the index k = 1, H(w/2) = Oatw = 2(2k + 1)z. 
For k = 2, H(w/4) = 0 atw = 4(2k 4+ 1)a, H(w/8) =0 
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Iterations of the Successive Approximations for @p, 


at w = 8(2k + 1)z, etc. Because [link] is a product of stretched versions of H(w), these 
zeros of H(w/2/) are the zeros of the Fourier transform of y(t). Recall from Theorem 
15 that H(w) has no zeros in —17/3 < w < 7/3. All of this gives a picture of the shape of 
®(w) and the location of its zeros. From an asymptotic analysis of ®(w) as w — oo, one 
can study the smoothness of y(t). 


A Matlab program that calculates ®(w) using this frequency domain successive 
approximations approach suggested by [link] is given in Appendix C. Studying this 
program gives further insight into the structure of ®(w). Rather than starting the 
calculations given in [link] for the index 7 = 1, they are started for the largest 7 = J and 
worked backwards. If we calculate a length-N DFT consistent with 7 = J using the FFT, 
then the samples of H (w /23 ) for 7 = J — 1 are simply every other sample of the case 
for 7 = J. The next stage for 7 = J — 2 is done likewise and if the original N is chosen a 
power of two, the process in continued down to 7 = 1 without calculating any more 
FFTs. This results in a very efficient algorithm. The details are in the program itself. 


This algorithm is so efficient, using it plus an inverse FFT might be a good way to 
calculate y(t) itself. Examples of the algorithm are illustrated in [link] where the 
transform is plotted for each step of the iteration. 
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Iterations of the Successive Approximations for ®(w) 


The Dyadic Expansion of the Scaling Function 


The next method for evaluating the scaling function uses a completely different approach. 
It starts by calculating the values of the scaling function at integer values of t, which can 
be done exactly (within our ability to solve simultaneous linear equations). Consider the 
basic recursion equation [link] for integer values of t = k 

Equation: 


= ) V2 (2k —n), 


and assume h(n) £ OforO<n<N-1. 


This is the refinement matrix illustrated in [link] for N = 6 which we write in matrix 
form as 
Equation: 


In other words, the vector of y(k) is the eigenvector of Mo for an eigenvalue of unity. 
The simple sum of S>, h(n) = V2 in [link] does not guarantee that Mp always has such 
an eigenvalue, but 5°, h(2n) = >>, h(2n + 1) in [link] does guarantee a unity 
eigenvalue. This means that if [link] is not satisfied, y(t) is not defined on the dyadic 
rationals and is, therefore, probably not a very nice signal. 


Our problem is to now find that eigenvector. Note from [link] that 

y(0) = y(N — 1) = Oorh(0) = h(N — 1) = 1/¥V2. For the Haar wavelet system, 
the second is true but for longer systems, this would mean all the other h(n) would have 
to be zero because of [link] and that is not only not interesting, it produces a very poorly 
behaved y(t). Therefore, the scaling function with N > 2 and compact support will 
always be zero on the extremes of the support. This means that we can look for the 
eigenvector of the smaller 4 by 4 matrix obtained by eliminating the first and last rows 
and columns of Mp. 


From [link] we form [My — I]y = 0 which shows that [Mo — I] is singular, meaning its 
rows are not independent. We remove the last row and assume the remaining rows are 
now independent. If that is not true, we remove another row. We next replace that row 
with a row of ones in order to implement the normalizing equation 

Equation: 


This augmented matrix, [Mp — I] with a row replaced by a row of ones, when multiplied 
by y gives a vector of all zeros except for a one in the position of the replaced row. This 


equation should not be singular and is solved for y which gives p(k), the scaling 
function evaluated at the integers. 


From these values of y(t) on the integers, we can find the values at the half integers 
using the recursive equation [link] or a modified form 
Equation: 


y (k/2) = ae )V2y¢(k—n) 


This is illustrated with the matrix equation [link] as 
Equation: 


Mig = 92 


Here, the first and last columns and last row are not needed (because 
~o = 5 = ¥11/2 = 0) and can be eliminated to save some arithmetic. 


The procedure described here can be repeated to find a matrix that when multiplied by a 
vector of the scaling function evaluated at the odd integers divided by k will give the 
values at the odd integers divided by 2k. This modified matrix corresponds to convolving 
the samples of y(t) by an up-sampled h(n). Again, convolution combined with up- and 
down-sampling is the basis of wavelet calculations. It is also the basis of digital filter 
bank theory. [link] illustrates the dyadic expansion calculation of a Daubechies scaling 
function for NV = 4 at each iteration of this method. 
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Iterations of the Dyadic Expansion for ®p, 


Not only does this dyadic expansion give an explicit method for finding the exact values 
of y(t) of the dyadic rationals (t = k/2’), but it shows how the eigenvalues of M say 
something about the y(t). Clearly, if y(t) is continuous, it says everything. 


Matlab programs are included in Appendix C to implement the successive approximation 
and dyadic expansion approaches to evaluating the scaling function from the scaling 
coefficients. They were used to generate the figures in this section. It is very illuminating 
to experiment with different h(n) and observe the effects on y(t) and (t). 


Regularity, Moments, and Wavelet System Design 


We now look at a particular way to use the remaining + — 1 degrees of freedom to design the NV 
values of h(n) after satisfying [link] and [link], which insure the existence and orthogonality (or 
property of being a tight frame) of the scaling function and wavelets [link], [link], [link]. 


One of the interesting characteristics of the scaling functions and wavelets is that while satisfying 
{link] and [link] will guarantee the existence of an integrable scaling function, it may be 
extraordinarily irregular, even fractal in nature. This may be an advantage in analyzing rough or fractal 
signals but it is likely to be a disadvantage for most signals and images. 


We will see in this section that the number of vanishing moments of h (7) and 7(t) are related to the 
smoothness or differentiability of y(t) and w(t). Unfortunately, smoothness is difficult to determine 
directly because, unlike with differential equations, the defining recursion [link] does not involve 
derivatives. 


We also see that the representation and approximation of polynomials are related to the number of 
vanishing or minimized wavelet moments. Since polynomials are often a good model for certain 
signals and images, this property is both interesting and important. 


The number of zero scaling function moments is related to the “goodness" of the approximation of 
high-resolution scaling coefficients by samples of the signal. They also affect the symmetry and 
concentration of the scaling function and wavelets. 


This section will consider the basic 2-band or multiplier-2 case defined in [link]. The more general M- 


Wavelets. 


K-Regular Scaling Filters 


Here we start by defining a unitary scaling filter to be an FIR filter with coefficients h(n) from the 
basic recursive [link] satisfying the admissibility conditions from [link] and orthogonality conditions 
from [link] as 

Equation: 


Soh(n)=V2 and S°h(k)h(k+2m) = 5(m). 
n k 


The term “scaling filter" comes from Mallat's algorithm, and the relation to filter banks discussed in 
Chapter: Filter Banks and the Discrete Wavelet Transform. The term “unitary” comes from the 
orthogonality conditions expressed in filter bank language, which is explained in Chapter: Filter Banks 
and Transmultiplexers. 


A unitary scaling filter is said to be K-regular if its z-transform has K zeros at z = e’”. This looks 
like 
Equation: 


where H (z) = 5°, h(n) z ” is the z-transform of the scaling coefficients h(n) and Q(z) has no 
poles or zeros at z = e’”. Note that we are presenting a definition of regularity of h(n), not of the 
scaling function y(t) or wavelet ¢(t). They are related but not the same. Note also from [link] that any 
unitary scaling filter is at least K = 1 regular. 


The length of the scaling filter is N which means H(z) is an N — 1 degree polynomial. Since the 
multiple zero at z = —1 is order K, the polynomial Q(z) is degree N — 1 — K. The existence of y(t) 
requires the zero" moment be V2 which is the result of the linear condition in [link]. Satisfying the 
conditions for orthogonality requires N’/2 conditions which are the quadratic equations in [link]. This 
means the degree of regularity is limited by 

Equation: 


N 
5° 


Daubechies used the degrees of freedom to obtain maximum regularity for a given N, or to obtain the 
minimum WN for a given regularity. Others have allowed a smaller regularity and used the resulting 
extra degrees of freedom for other design purposes. 


Regularity is defined in terms of zeros of the transfer function or frequency response function of an 
FIR filter made up from the scaling coefficients. This is related to the fact that the differentiability of a 
function is tied to how fast its Fourier series coefficients drop off as the index goes to infinity or how 
fast the Fourier transform magnitude drops off as frequency goes to infinity. The relation of the Fourier 
transform of the scaling function to the frequency response of the FIR filter with coefficients h() is 
given by the infinite product [link]. From these connections, we reason that since H(z) is lowpass and, 
if it has a high order zero at z = —1 (i.e., w = 7), the Fourier transform of y(t) should drop off 
rapidly and, therefore, y(t) should be smooth. This turns out to be true. 


We next define the k*” moments of p(t) and 2(t) as 
Equation: 


m(k) = [ tp tat 


and 
Equation: 


my (k) = ft v(t) ae 


and the discrete k*" moments of h(n) and hy (n) as 
Equation: 


1 (b) = Sonk h(n) 


and 
Equation: 


pis (k) = So n* hy (n). 


n 


The partial moments of h(n) (moments of samples) are defined as 
Equation: 


v (kt) = $0 (an+£)*h(2n +2). 


Note that u(k) = v(k, 0) + v(k, 1). 


From these equations and the basic recursion [link] we obtain [link] 
Equation: 


m(k) = ree ) HOm(e~ 8) 


which can be derived by substituting [link] into [link], changing variables, and using [link]. Similarly, 
we obtain 
Equation: 


m (k) -F-¥ (i) mome- é). 


£=0 


These equations exactly calculate the moments defined by the integrals in [link] and [link] with simple 
finite convolutions of the discrete moments with the lower order continugus moments. A similar 
Functions and Wavelets [link]. A Matlab program that calculates fhe continuous moments from the 
discrete moments using [link] and [link] is given in Appendix C. 


Vanishing Wavelet Moments 


Requiring the moments of w(t) to be zero has several interesting consequences. The following three 
theorems show a variety of equivalent characteristics for the K-regular scaling filter, which relate both 
to our desire for smooth scaling functions and wavelets as well as polynomial representation. 


Theorem 20 (Equivalent Characterizations of K-Regular Filters) A unitary scaling filter is K- 
regular if and only if the following equivalent statements are true: 


1. All moments of the wavelet filters are zero, 1 (k) = 0, fork =0,1,---,(K —1) 

2. All moments of the wavelets are zero, m1 (k) = 0, fork = 0,1,---,(& —1) 

3. The partial moments of the scaling filter are equal for k = 0,1,---,(K — 1) 

4. The frequency response of the scaling filter has a zero of order K at w = 7, i.e. [link]. 


5. The k*” derivative of the magnitude-squared frequency response of the scaling filter is zero at 
w= Ofork=1,2,---,2Kk —1. 

6. All polynomial sequences up to degree (K — 1) can be expressed as a linear combination of 
shifted scaling filters. 

7. All polynomials of degree up to (K — 1) can be expressed as a linear combination of shifted 
scaling functions at any scale. 


This is a very powerful result [link], [link]. It not only ties the number of zero moments to the 
regularity but also to the degree of polynomials that can be exactly represented by a sum of weighted 
and shifted scaling functions. 


Theorem 21 If ~(t) is K-times differentiable and decays fast enough, then the first K — 1 wavelet 
moments vanish [link]; i.e., 
Equation: 


implies 
Equation: 


m1(k) = 0. 0<k< Kk 
Unfortunately, the converse of this theorem is not true. However, we can relate the differentiability of 
w(t) to vanishing moments by 


Theorem 22 There exists a finite positive integer L such that if m, (k) = 0 for0O <k < K —1 then 
Equation: 


oe atP ” 


forLP>K. 


For example, a three-times differentiable ~(¢) must have three vanishing moments, but three vanishing 
moments results in only one-time differentiability. 


These theorems show the close relationship among the moments of hj (7), w(t), the smoothness of 
H(w) at w = 0 and z and to polynomial representation. It also states a loose relationship with the 
smoothness of y(t) and (t) themselves. 


Daubechies' Method for Zero Wavelet Moment Design 


Daubechies used the above relationships to show the following important result which constructs 
orthonormal wavelets with compact support with the maximum number of vanishing moments. 


Theorem 23 The discrete-time Fourier transform of h(n) having K zeros at w = 7 of the form 


Equation: 


satisfies 
Equation: 


IH (w)[? + |H(wtm)/? =2 


if and only if L (w) = |L (w)| can be written 
Equation: 


L(w) = P(sin” (w/2)) 


with K < N/2 where 
Equation: 


and R(y) is an odd polynomial chosen so that P(y) > 0 for0 < y <1. 


If R = 0, the length N is minimum for a given regularity K = N/2.If N > 2K, the second term 
containing R has terms with higher powers of y whose coefficients can be used for purposes other than 
regularity. 


The proof and a discussion are found in Daubechies [link], [link]. Recall from [link] that H(w) always 
has at least one zero at w = 7 as a result of h(n) satisfying the necessary conditions for y(t) to exist 
and have orthogonal integer translates. We are now placing restrictions on h(n) to have as high an 
order zero at w = 7 as possible. That accounts for the form of [link]. Requiring orthogonality in [link] 
gives [link]. 


Because the frequency domain requirements in [link] are in terms of the square of the magnitudes of 
the frequency response, spectral factorization is used to determine H(w) and therefore h(n) from 
|H (w)|?. [link] becomes 

Equation: 


If we use the functional notation: 
Equation: 


then [link] becomes 
Equation: 


M(w) = | cos? (w/2)|* L (w). 
Since M(w) and L(w) are even functions of w they can be written as polynomials in cos (w) and, 


using cos (w) = 1 — 2 sin? (w/2), [link] becomes 
Equation: 


M (sin? (w/2)) = |cos® (w/2)|* P (sin? (w/2)) 


which, after a change of variables of y =sin? (w/2) = 1— cos? (w/2), becomes 
Equation: 


M (y) = (1—y)* P(y) 


where P(y) is an (IV — K) order polynomial which must be positive since it will have to be factored 
to find H(w) from [link]. This now gives [link] in terms of new variables which are easier to use. 


In order that this description supports an orthonormal wavelet basis, we now require that [link] 
satisfies [link] 
Equation: 


JH (w)|?+|H(w+n)|? = 2 


which using [link] and [link] becomes 
Equation: 


M(w)+M(w+n) = (1-y)* P(y)+y*P(l-y) = 2. 


Equations of this form have an explicit solution found by using Bezout's theorem. The details are 
developed by Daubechies [link]. If all the (V/2 — 1) degrees of freedom are used to set wavelet 
moments to zero, we set kK = N/2 and the solution to [link] is given by 


Equation: 
KoA 
KkK-1+k 
P= (" 4 )e 
k=0 


which gives a complete parameterization of Daubechies' maximum zero wavelet moment design. It 
also gives a very straightforward procedure for the calculation of the h(n) that satisfy these conditions. 
Herrmann derived this expression for the design of Butterworth or maximally flat FIR digital filters 
[link]. 


If the regularity is K < N/2, P(y) must be of higher degree and the form of the solution is 
Equation: 
al 
oe ee 
P = a 
(y) > ( k )y + yw Rl 5-y 


where R(y) is chosen to give the desired filter length N, to achieve some other desired property, and 
to give P(y) > 0. 


The steps in calculating the actual values of h(n) are to first choose the length N (or the desired 
regularity) for h(n), then factor |H (w)|? where there will be freedom in choosing which roots to use 
for H(w). The calculations are more easily carried out using the z-transform form of the transfer 
function and using convolution in the time domain rather than multiplication (raising to a power) in the 
frequency domain. That is done in the Matlab program [hn, hin] = daub(N) in Appendix C 
where the polynomial coefficients in [link] are calculated from the binomial coefficient formula. This 
polynomial is factored with the roots command in Matlab and the roots are mapped from the 
polynomial variable yy to the variable z, in [link] using first cos (w) = 1 — 2 yp, then with 

i sin (w) = \/cos? (w) — 1 ande™ =cos (w) +7 sin (w) we use z = e™. These changes of 
variables are used by Herrmann [link] and Daubechies [link]. 


Examine the Matlab program to see the details of just how this is carried out. The program uses the 
sort command to order the roots of H(z) H(1/z) after which it chooses the N — 1 smallest ones to 
give a minimum phase H(z) factorization. You could choose a different set of N — 1 roots in an effort 
to get a more linear phase or even maximum phase. This choice allows some variation in Daubechies 
wavelets of the same length. The M-band generalization of this is developed by Heller in [link], [link]. 
In [link], Daubechies also considers an alternation of zeros inside and outside the unit circle which 
gives a more symmetric h(n). A completely symmetric real h(n) that has compact support and 
supports orthogonal wavelets is not possible; however, symmetry is possible for complex h(n), 
biorthogonal systems, infinitely long h(n), and multiwavelets. Use of this zero moment design 
approach will also assure the resulting wavelets system is an orthonormal basis. 


If all the degrees of freedom are used to set moments to zero, one uses kK = N/2 in [link] and the 
above procedure is followed. It is possible to explicitly set a particular pair of zeros somewhere other 
than at w = 7. In that case, one would use K = (N/2) — 2 in [link]. Other constraints are developed 
later in this chapter and in later chapters. 


To illustrate some of the characteristics of a Daubechies wavelet system, [link] shows the scaling 
function and wavelet coefficients, h(n) and h (n), and the corresponding discrete scaling coefficient 
moments and wavelet coefficient moments for a length-8 Daubechies system. Note the N/2 = 4 zero 


moments of the wavelet coefficients and the zero scaling coefficient moment of (0) = V2. 


n — A(n) hy (n) uk) 41 (k) k 


0 0.23037781330890 0.01059740178507 1.414213 0 

1 0.71484657055292 0.03288301166689 1.421840 0 

2 0.63088076792986 -0.03084138183556 1.429509 0 

3 -0.02798376941686 -0.18703481171909 0.359097 0 

4 -0.18703481171909 0.02798376941686 -2.890773 12.549900 

5 0.03084138183556 0.63088076792986 -3.453586 267.067254 
6 0.03288301166689 -0.71484657055292 23.909120 3585.681937 
7 -0.01059740178507 0.23037781330890 


Scaling Function and Wavelet Coefficients plus their Discrete Moments for Daubechies-8 


[link] gives the same information for the length-6, 4, and 2 Daubechies scaling coefficients, wavelet 
coefficients, scaling coefficient moments, and wavelet coefficient moments. Again notice how many 
discrete wavelet moments are zero. 


[link] shows the continuous moments of the scaling function y(t) and wavelet (t) for the Daubechies 
systems with lengths six and four. The discrete moments are the moments of the coefficients defined 
by [link] and [link] with the continuous moments defined by [link] and [link] calculated using [link] 


and [link] with the programs listed in Appendix C. 


Daubechies NV = 6 


n — R(n) hi (n) yk) pis (k) 
0 0.33267055295008 -0.03522629188571 1.414213 0 

1 0.80689150931109 -0.08544127388203 1.155979 0 

2 0.45987750211849 0.13501102001025 0.944899 0 

3 -0.13501102001025 0.45987750211849 -0.224341 3.354101 

4 -0.08544127388203 -0.80689150931109 -2.627495 40.679682 
5 0.03522629188571 0.33267055295008 5.305591 329.323717 


Daubechies NV = 4 


0 


1 


Daubechies Scaling Function and Wavelet Coefficients plus their Moments 


h(n) 

0.48296291314453 
0.83651630373781 
0.22414386804201 
-0.12940952255126 
Daubechies N = 2 
h(n) 

0.70710678118655 


0.70710678118655 


1.4142135 
1.1559780 
0.9448992 
-0.2243420 
-2.6274948 
5.3055914 
N=4 
uk) 
1.4142136 
0.8965755 


0.5684061 


hy (n) 


0.12940952255126 


0.22414386804201 


-0.83651630373781 


0.48296291314453 


hy (n) 


0.70710678118655 


-0.70710678118655 


0 
3.3541019 
40.6796819 


329.3237168 


Hi (k) 
0 
0 


1.2247449 


u(k) 

1.414213 
0.896575 
0.568406 


-0.864390 


u(k) 
1.414213 


0.707107 


m(k) 

1.0000000 
0.8174012 
0.6681447 
0.4454669 
0.1172263 


-0.0466511 


m(k) 
1.0000000 
0.6343975 


0.4019238 


Hi (k) 


1.224744 


6.572012 


Hi (k) 
0 


0.707107 


0 
0.2964635 
2.2824642 


11.4461157 


my (k) 
0 
0 


0.2165063 


3 -0.8643899 6.5720121 0.1310915 0.7867785 
4 -6.0593531 25.9598790 -0.3021933 2.0143421 
fs) -23.4373939 90.8156100 -1.0658728 4.4442798 


Daubechies Scaling Function and Wavelet Continuous and Discrete Moments 


These tables are very informative about the characteristics of wavelet systems in general as well as 
particularities of the Daubechies system. We see the jz (0) = J 2 of {link] and [link] that is necessary 
for the existence of a scaling function solution to [link] and the jz; (0) = m, (0) = 0 of [link] and 
[link] that is necessary for the orthogonality of the basis functions. Orthonormality requires [link] 
which is seen in comparison of the h(n) and hj (n), and it requires m(0) = 1 from [link] and [link]. 
After those conditions are satisfied, there are N/2 — 1 degrees of freedom left which Daubechies uses 
to set wavelet moments m, (k) equal zero. For length-6 we have two zero wavelet moments and for 
length-4, one. For all longer Daubechies systems we have exactly N’/2 — 1 zero wavelet moments in 
addition to the one m, (0) = 0 for a total of N/2 zero wavelet moments. Note m (2) = m(1)” as will 
be explained in [link] and there exist relationships among some of the values of the even-ordered 
scaling function moments, which will be explained in [link] through [link]. 


As stated earlier, these systems have a maximum number of zero moments of the wavelets which 
results in a high degree of smoothness for the scaling and wavelet functions. [link] and [link] show the 
Daubechies scaling functions and wavelets for NV = 4, 6,8, 10,12, 16, 20, 40. The coefficients were 
generated by the techniques described in Section: Parameterization of the Scaling Coefficients and 
Chapter: Regularity, Moments, and Wavelet System Design. The Matlab programs are listed in 
Appendix C and values of h(n) can be found in [link] or generated by the programs. Note the 
increasing smoothness as JN is increased. For N = 2, the scaling function is not continuous; for 

N = 4, it is continuous but not differentiable; for N = 6, it is barely differentiable once; for N = 14, 
it is twice differentiable, and similarly for longer h(n). One can obtain any degree of differentiability 
for sufficiently long h(n). 


The Daubechies coefficients are obtained by maximizing the number of moments that are zero. This 
gives regular scaling functions and wavelets, but it is possible to use the degrees of 
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Daubechies Scaling Functions, NV = 4,6, 8,..., 40 


freedom to maximize the differentiability of y(t) rather than maximize the zero moments. This is not 
easily parameterized, and it gives only slightly greater smoothness than the Daubechies system [link]. 
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Daubechies Wavelets, N = 4,6,8,..., 40 


Examples of Daubechies scaling functions resulting from choosing different factors in the spectral 
factorization of |H (w)|” in [link] can be found in [link]. 


Non-Maximal Regularity Wavelet Design 


If the length of the scaling coefficient filter is longer than twice the desired regularity, i.e, N > 2K, 
then the parameterization of [link] should be used and the coefficients in the polynomial R(y) must be 
determined. One interesting possibility is that of designing a system that has one or more zeros of 
H(w) between w = 7/2 and w = x with the remaining zeros at 7 to contribute to the regularity. This 
will give better frequency separation in the filter bank in exchange for reduced regularity and lower 
degree polynomial representation. 


If a zero of H(w) is set at w = wo, then the conditions of 
Equation: 


d M(w) =, 
dw 7 


W=W9 


M (wo) =0 and 


are imposed with those in [link], giving a set of linear simultaneous equations that can be solved to 
find the scaling coefficients h(n). 


A powerful design system based on a Remez exchange algorithm allows the design of an orthogonal 
wavelet system that has a specified regularity and an optimal Chebyshev behavior in the stopband of 
H (w). This and a variation that uses a constrained least square criterion [link] is described in [link] 
and another Chebyshev design in [link]. 


An alternative approach is to design the wavelet system by setting an optimization criterion and using 
a general constrained optimization routine, such as that in the Matlab optimization toolbox, to design 
the h(n) with the existence and orthogonality conditions as constraints. This approach was used to 
generate many of the filters described in [link]. Jun Tian used a Newton's method [link], [link] to 
design wavelet systems with zero moments. 


Relation of Zero Wavelet Moments to Smoothness 


We see from [link] and [link] that there is a relationship between zero wavelet moments and 
smoothness, but it is not tight. Although in practical application the degree of differentiability may not 
be the most important measure, it is an important theoretical question that should be addressed before 
application questions can be properly posed. 


First we must define smoothness. From the mathematical point of view, smoothness is essentially the 
same as differentiability and there are at least two ways to pose that measure. The first is local (the 
H6lder measure) and the second is global (the Sobolev measure). Numerical algorithms for estimating 
the measures in the wavelet setting have been developed by Rioul [link] and Heller and Wells [link], 
[link] for the Hélder and Sobolev measures respectively. 


Definition 1 (H6lder continuity) Let ~ : IR — C and let 0 < a < 1. Then the function 9 is Hélder 
continuous of order a if there exists a constant c such that 
Equation: 


lo(x) — o(y)| <clx—yl, foralla,ye R 


Based on the above definition, one observes y has to be a constant if a > 1. Hence, this is not very 
useful for determining regularity of order a > 1. However, using the above definition, Hdélder 
regularity of any order r > 0 is defined as follows: 


Definition 2 (Hélder regularity) A function y : R — C is regular of order r = P + a (0<a<1) if 
y € C? and its Pth derivative is Hélder continuous of order a 


Definition 3 (Sobolev regularity) 
Let py: R > C, then ¢ is said to belong to the Sobolev space of order s (y € H,)if 


Equation: 


5(w) de < 00 


{Cm} 


Notice that, although Sobolev regularity does not give the explicit order of differentiability, it does 
yield lower and upper bounds on r, the Holder regularity, and hence the differentiability of ~ if 

y € L?. This can be seen from the following inclusions: 

Equation: 


Het CO" CH: 


A very interesting and important result by Volkmer [link] and by Eirola [link] gives an exact 
asymptotic formula for the Hélder regularity index (exponent) of the Daubechies scaling function. 


Theorem 24 The limit of the Hélder regularity index of a Daubechies scaling function as the length of 
the scaling filter goes to infinity is [link] 
Equation: 


This result, which was also proven by A. Cohen and J. P. Conze, together with empirical calculations 
for shorter lengths, gives a good picture of the smoothness of Daubechies scaling functions. This is 
illustrated in [link] where the Hélder index is plotted versus scaling filter length for both the 
maximally smooth case and the Daubechies case. 


The question of the behavior of maximally smooth scaling functions was empirically addressed by 
Lang and Heller in [link]. They use an algorithm by Rioul to calculate the Holder smoothness of 
scaling functions that have been designed to have maximum Holder smoothness and the results are 
shown in [link] together with the smoothness of the Daubechies scaling functions as functions of the 
length of the scaling filter. For the longer lengths, it is possible to design systems that give a scaling 
function over twice as smooth as with a Daubechies design. In most applications, however, the greater 
Holder smoothness is probably not important. 


Holder Smoothness 


Holder Smoothness versus Coefficient Length for 
Daubechies' (+) and Maximally Smooth (0) Wavelets. 
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Number of Zeros at w = 7 versus Coefficient Length for 
Daubechies' (+) and Maximally Smooth (0) Wavelets. 


[link] shows the number of zero moments (zeros at w = 77) as a function of the number of scaling 
function coefficients for both the maximally smooth and Daubechies designs. 


One case from this figure is for N = 26 where the Daubechies smoothness is S77 = 4.005 and the 
maximum smoothness is S77 = 5.06. The maximally smooth scaling function has one more 
continuous derivative than the Daubechies scaling function. 


Recent work by Heller and Wells [link], [link] gives a better connection of properties of the scaling 
coefficients and the smoothness of the scaling function and wavelets. This is done both for the scale 
factor or multiplicity of MM = 2 and for general integer M. 


The usual definition of smoothness in terms of differentiability may not be the best measure for certain 
signal processing applications. If the signal is given as a sequence of numbers, not as a function of a 


continuous variable, what does smoothness mean? Perhaps the use of the variation of a signal may be 
a useful alternative [link], [link], [link]. 


Vanishing Scaling Function Moments 


While the moments of the wavelets give information about flatness of H(w) and smoothness of 7(t), 
the moments of y(t) and h(n) are measures of the “localization” and symmetry characteristics of the 


scaling function and, therefore, the wavelet transform. We know from [link] that }>,, h(n) = V2 and, 
after normalization, that f y(t) dt = 1. Using [link], one can show [link] that for kK > 2, we have 
Equation: 


m (2) = m? (1). 
This can be seen in [link]. A generalization of this result has been developed by Johnson [link] and is 
given in [link] through [link]. 


A more general picture of the effects of zero moments can be seen by next considering two 
approximations. Indeed, this analysis gives a very important insight into the effects of zero moments. 
The mixture of zero scaling function moments with other specifications is addressed later in [link]. 


Approximation of Signals by Scaling Function Projection 


The orthogonal projection of a signal f(t) on the scaling function subspace V; is given and denoted by 
Equation: 


PI{F(} = DFO, G3 O) Gis () 


k 


which gives the component of f(t) which is in V; and which is the best least squares approximation to 


f(t) in Vj. 


As given in [link], the 2” moment of w(t) is defined as 
Equation: 


mee if tab (t) dt. 


We can now state an important relation of the projection [link] as an approximation to f(t) in terms of 
the number of zero wavelet moments and the scale. 


Theorem 25 If m, (2) = 0 for £= 0,1,---, L then the L? error is 
Equation: 


e1 =|| f(t) — PP {FO} [lax C29), 


where C is a constant independent of j and L but dependent on f(t) and the wavelet system [link], 
[link]. 


This states that at any given scale, the projection of the signal on the subspace at that scale approaches 
the function itself as the number of zero wavelet moments (and the length of the scaling filter) goes to 
infinity. It also states that for any given length, the projection goes to the function as the scale goes to 
infinity. These approximations converge exponentially fast. This projection is illustrated in [link]. 


Approximation of Scaling Coefficients by Samples of the Signal 


A second approximation involves using the samples of f(t) as the inner product coefficients in the 
wavelet expansion of f(t) in [link]. We denote this sampling approximation by 
Equation: 


Si {F(t)} = 552%? Ff (k/2) v5 (t) 


k 


and the scaling function moment by 
Equation: 


m(t) = [Howe 


and can state [link] the following 


Theorem 26 If m(£) = 0 for £= 1,2,-+-, L then the L? error is 
Equation: 


e2 =|| S7{F )} — PP{F Oh ll2< C22, 


where C'z is a constant independent of j and L but dependent on f(t) and the wavelet system. 


This is a similar approximation or convergence result to the previous theorem but relates the projection 
of f(t) on a j-scale subspace to the sampling approximation in that same subspace. These 
approximations are illustrated in [link]. 


Approximation and Projection of f(t) at a Finite Scale 


This “vector space" illustration shows the nature and relationships of the two types of approximations. 
The use of samples as inner products is an approximation within the expansion subspace V;. The use 
of a finite expansion to represent a signal f(¢) is an approximation from L? onto the subspace Vi. 
Theorems [link] and [link] show the nature of those approximations, which, for wavelets, is very good. 


An illustration of the effects of these approximations on a signal is shown in [link] where a signal with 
a very smooth component (a sinusoid) and a discontinuous component (a square wave) is expanded in 
a wavelet series using samples as the high resolution scaling function coefficients. Notice the effects of 
projecting onto lower and lower resolution scales. 


If we consider a wavelet system where the same number of scaling function and wavelet moments are 
set zero and this number is as large as possible, then the following is true [link], [link]: 


Theorem 27 If m (£) = m, (£) = 0 for £ = 1,2,---, Land m (0) = 0, then the L? error is 
Equation: 


e3 =l| f(t) — S7{F @} lla < C20, 


where C’3 is a constant independent of j and L, but dependent on f(t) and the wavelet system. 


Here we see that for this wavelet system called a Coifman wavelet system, that using samples as the 
inner product expansion coefficients is an excellent approximation. This justifies that using samples of 
a signal as input to a filter bank gives a proper wavelet analysis. This approximation is also illustrated 
in [link] and in [link]. 


Coiflets and Related Wavelet Systems 


From the previous approximation theorems, we see that a combination of zero wavelet and zero 
scaling function moments used with samples of the signal may give superior results to wavelets with 
only zero wavelet moments. Not only does forcing zero scaling function moments give a better 
approximation of the expansion coefficients by samples, it often causes the scaling function to be more 
symmetric. Indeed, that characteristic may be more important than the sample approximation in certain 
applications. 


Daubechies considered the design of these wavelets which were suggested by Coifman [link], [link], 
{link]. Gopinath [link], [link] and Wells [link], [link] show how zero scaling function moments give a 
better approximation of high-resolution scaling coefficients by samples. Tian and Wells [link], [link] 
have also designed biorthogonal systems with mixed zero moments with very interesting properties. 
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Approximations to f(t) at a Different Finite Scales 


The Coifman wavelet system (Daubechies named the basis functions “coiflets") is an orthonormal 
multiresolution wavelet system with 
Equation: 


[eat =m() =o, for k=1,2,-+:,Lb-1 
Equation: 


[#e@at=m(y=0, for ad ere, be, 


This definition imposes the requirement that there be at least L — 1 zero scaling function moments and 
at least L — 1 wavelet moments in addition to the one zero moment of m (0) required by 
orthogonality. This system is said to be of order or degree L and sometime has the additional 
requirement that the length of the scaling function filter h(n), which is denoted N, is minimum [link], 
[link]. In the design of these coiflets, one obtains more total zero moments than NV 7 2 — 1. This was 
first noted by Beylkin, et al [link]. The length-4 wavelet system has only one degree of freedom, so it 
cannot have both a scaling function moment and wavelet moment of zero (see [link]). Tian [Link], 
[link] has derived formulas for four length-6 coiflets. These are: 


Equation: 
| a LOE Tea TNE Bae Paeyt 
16V2 ° 16V2° 8V2 °° 8v2 °° 16V2” 16V2 |’ 
or 
Equation: 
Fics =s=/77 1477 F477 Tov7 baV7T 1eV7 
16V2 ' 16V2° 8V2° 8v2 °° 16V2’ 16V2 |’ 
or 
Equation: 
,—|ratvib 1-vib 3-Vvi5 3+V15 13+V15 9~V15 
1/2’ 16v2’ 8Vv2’? sv2’ i6v2 ’ 16V2 |’ 
or 


Equation: 


,—|r37 Vib l+vib 34+VvI5 3-VvIb 13—VI5 9+ VI5 
16V2 ' 16V2 °° 8v2 °° 8v2 °° 16v2 °°» =16V2 | 


with the first formula [link] giving the same result as Daubechies [link], [link] (corrected) and that of 
Odegard [link] and the third giving the same result as Wickerhauser [link]. The results from [link] are 
included in [link] along with the discrete moments of the scaling function and wavelet, p(k) and 

1 (k) for k = 0,1, 2,3. The design of a length-6 Coifman system specifies one zero scaling function 
moment and one zero wavelet moment (in addition to jz; (0) = 0), but we, in fact, obtain one extra 
zero scaling function moment. That is the result of m (2) = m(1)? from [link]. In other words, we get 
one more zero scaling function moment than the two degrees of freedom would seem to indicate. This 
is true for all lengths N = 62 for @ = 1, 2, 3,--- and is a result of the interaction between the scaling 
function moments and the wavelet moments described later. 


The property of zero wavelet moments is shift invariant, but the zero scaling function moments are 
shift dependent [link]. Therefore, a particular shift for the scaling function must be used. This shift is 
two for the length-6 example in [link], but is different for the solutions in [link] and [link]. Compare 
this table to the corresponding one for Daubechies length-6 scaling functions and wavelets given in 
[link] where there are two zero discrete wavelet moments — just as many as the degrees of freedom in 
that design. 


The scaling function from [Link] is fairly symmetric, but not around its center and the other three 
designs in [link], [link], and [link] are not symmetric at all. The scaling function from [link] is also 
fairly smooth, and from [link] only slightly less so but the scaling function from [link] is very rough 
and from [link] seems to be fractal. Examination of the frequency response H(w) and the zero location 
of the FIR filters h(n) shows very similar frequency responses for [link] and [link] with [link] having 
a somewhat irregular but monotonic frequency response and [link] having a zero on the unit circle at 

w = 1/3, i.e., not satisfying Cohen's condition [link] for an orthognal basis. It is also worth noticing 
that the design in [link] has the largest Hélder smoothness. These four designs, all satisfying the same 
necessary conditions, have very different characteristics. This tells us to be very careful in using zero 
moment methods to design wavelet systems. The designs are not unique and some are much better than 
others. 


[link] contains the scaling function and wavelet coefficients for the length-6 and 12 designed by 
Daubechies and length-8 designed by Tian together with their discrete moments. We see the extra zero 
scaling function moments for lengths 6 and 12 and also the extra zero for lengths 8 and 12 that occurs 
after a nonzero one. 


The continuous moments can be calculated from the discrete moments and lower order continuous 
moments [link], [link], [link] using [link] and [link]. An important relationship of the discrete moments 
for a system with kK — 1 zero wavelet moments is found by calculating the derivatives of the 
magnitude squared of the discrete time Fourier transform of h(n) which is H (w) = >>, h(n) e 
and has 2K — 1 zero derivatives of the magnitude squared at w = 0. This gives [link] the k*” 
derivative fork evenandl <k< 2K —1 

Equation: 


k 


(A )A-D'nO we-o =:0; 


£=0 


Solving for y:(k) in terms of lower order discrete moments and using yz (0) = V2 gives for k even 
Equation: 


k 


1 # 
2/2 T= 


y= (7) (0) w(k—2) 


which allows calculating the even-order discrete scaling function moments in terms of the lower odd- 
order discrete scaling function moments for k = 2,4,---,2K — 2. For example: 
Equation: 


which can be seen from values in [link]. 


Johnson [link] noted from Beylkin [link] and Unser [link] that by using the moments of the 
autocorrelation function of the scaling function, a relationship of the continuous scaling function 
moments can be derived in the form 

Equation: 


where 0 < k < 2K if K — 1 wavelet moments are zero. Solving for m(k) in terms of lower order 
moments gives for k even 
Equation: 


which allows calculating the even-order scaling function moments in terms of the lower odd-order 
scaling function moments for k = 2,4,---,2K — 2. For example [link]: 
Equation: 


-3 


-2 


Length-N = 6, 
h(n) 
-0.07273261951285 
0.33789766245781 
0.85257202021226 
0.38486484686420 
-0.07273261951285 


-0.01565572813546 


Length-N = 8, 
h(n) 
0.04687500000000 


-0.02116013576461 


-0.14062500000000 


0.43848040729385 


1.38486484686420 


0.38486484686420 


-0.07273261951285 


-0.01565572813546 


lof 
Ow 
S35 
—S ~~ 
ot “es 


m ( 
m (1) + 10m? (3) + 60m (3) m? (1) + 45 m® (1) 
m ( 


1) + 56m (5) m (3) — 168m (5) m3 (1) 
+2520 m (3) m® (1) — 840 m (3) m? (1) — 1575 m8 (1) 


Degree L = 2 
hy (n) 
0.01565572813546 


-0.07273261951285 


-0.38486484686420 


0.85257202021226 


-0.33789766245781 


-0.07273261951285 


Degree L = 3 
hy (n) 
0.01565572813546 


-0.07273261951285 


-0.38486484686420 


1.38486484686420 


-0.43848040729385 


-0.14062500000000 


0.02116013576461 


0.04687500000000 


uk) 
1.414213 
0 

0 
-0.375737 


-2.872795 


uk) 
1.414213 

0 

0 
-2.994111 
0 
-45.851020 


63.639610 


H1 (k) 


-1.163722 
-3.866903 


-10.267374 


1 (k) 


0 

0.187868 
11.976447 
-43.972332 


271.348747 


-3 


o 


6 
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Coiflet Scaling Function and Wavelet Coefficients plus their Discrete Moments 


Length-N = 12, 
h(n) 
0.016387336463 
-0.041464936781 
-0.067372554722 
0.386110066823 
0.812723635449 
0.417005184423 
-0.076488599078 
-0.059434418646 
0.023680171946 
0.005611434819 
-0.001823208870 


-0.000720549446 


Degree L = 4 

hy (n) 
0.000720549446 
0.001823208870 
-0.005611434819 
-0.023680171946 
0.059434418646 
0.076488599078 
-0.417005184423 
-0.812723635449 
-0.386110066823 
0.067372554722 
0.041464936781 


-0.016387336463 


u(k) 
1.414213 

0 

0 

0 

0 

-5.911352 

0 
-586.341304 


3096.310009 


H1 (k) 


0 

0 

11.18525 
175.86964 
1795.33634 
15230.54650 


117752.68833 


if the wavelet moments are zero up to k = K — 1. Notice that setting m(1) = m(3) = 0 causes 


m(2) = m(4) = m(6) = m(8) = 0 if sufficient wavelet moments are zero. This explains the extra 


zero moments in [link]. It also shows that the traditional specification of zero scaling function 
moments is redundant. In [link] m(8) would be zero if more wavelet moments were zero. 


1.4142135623 


0 


0 


L=2 
M1 (k) 
0 
0 


-1.1637219122 


m(k) 
1.0000000000 
0 


0 


-0.2057189138 


3 -0.3757374752 -3.8669032118 -0.0379552166 -0.3417891854 
4+ -2.8727952940 -10.2673737288 -0.1354248688 -0.4537580992 
5 -3.7573747525 -28.0624304008 -0.0857053279 -0.6103378310 
N= 8, L=3 

k pk) tH (k) m(k) ma (k) 

0 1.4142135623 0 1.0000000000 0 

1 0 0 0 0 

2 0 0 0 0 

3 -2.9941117777 0.1878687376 -0.3024509630 0.0166054072 
4 0 11.9764471108 0 0.5292891854 
5 -45.8510203537 -43.9723329775 -1.0458570134 -0.9716604635 


Discrete and Continuous Moments for the Coiflet Systems 


To see the continuous scaling function and wavelet moments for these systems, [link] shows both the 
continuous and discrete moments for the length-6 and 8 coiflet systems. Notice the zero moment 
m(4) = (4) = 0 for length-8. The length-14, 20, and 26 systems also have the “extra” zero scaling 
moment just after the first nonzero moment. This always occurs for length-N = 6¢ + 2 coiflets. 


[link] shows the length-6, 8, 10, and 12 coiflet scaling functions y(t) and wavelets w(t). Notice their 
approximate symmetry and compare this to Daubechies' classical wavelet systems and her more 
symmetric ones achieved by using the different factorization mentioned in [link] and shown in [link]. 
The difference between these systems and truly symmetric ones (which requires giving up 
orthogonality, realness, or finite support) is probably negligible in many applications. 


(g) vc12 (h) voi 


Length-6, 8, 10, and 12 Coiflet Scaling Functions and Wavelets 


Coifman Wavelet Systems from a Specified Filter Length 


The preceding section shows that Coifman systems do not necessarily have an equal number of scaling 
function and wavelet moments equal to zero. Lengths N = 6£ + 2 have equal numbers of zero scaling 
function and wavelet moments, but always have even-order “extra” zero scaling function moments 
located after the first nonzero one. Lengths N = 6£ always have an “extra" zero scaling function 
moment. Indeed, both will have several even-order “extra" zero moments for longer NV as a result of 
the relationships illustrated in [link] through [link]. Lengths N = 6 — 2 do not occur for the original 
definition of a Coifman system if one looks only at the degree with minimum length. If we specify the 
length of the coefficient vector, all even lengths become possible, some with the same coiflet degree. 


We examine the general Coifman wavelet system defined in [link] and [link] and allow the number of 
specified zero scaling function and wavelet moments to differ by at most one. That will include all the 
reported coiflets plus length-10, 16, 22, and N = 6¢ — 2. The length-10 was designed by Odegard 
[link] by setting the number of zero scaling functions to 3 and the number of zero wavelet moment to 2 
rather than 2 and 2 for the length-8 or 3 and 3 for the length-12 coiflets. The result in [link] shows that 
the length-10 design again gives one extra zero scaling function moment which is two more than the 
number of zero wavelet moments. This is an even-order moment predicted by [link] and results in a 
total number of zero moments between that for length-8 and length-12, as one would expect. A similar 


approach was used to design length-16, 22, and 28. 


Length-N = 4, Degree L = 1 
n h(n) hy (n) p(k) p(k) 
-1 0.224143868042 0.129409522551 1.414213 0 
0 0.836516303737 0.482962913144 0 -0.517638 
1 0.482962913144 -0.836516303737 0.189468 0.189468 
2 -0.129409522551 0.224143868042 -0.776457 0.827225 
Length-N = 10, Degree L = 3 
n h(n) hy (n) p(k) p(k) 
-2 0.032128481856 0.000233764788 1.414213 0 
-1 -0.075539271956 -0.000549618934 0 0 
0 -0.096935064502 -0.013550370057 0 0 
1 0.491549094027 0.033777338659 0 3.031570 
2 0.805141083557 0.304413564385 0 24.674674 
3 0.304413564385 -0.805141083557 -14.709025 138.980052 
4 -0.033777338659 0.491549094027 64.986095 710.373341 
5 -0.013550370057 0.096935064502 
6 0.0005496 18934 -0.075539271956 
z 0.000233764788 0.032128481856 


Coiflet Scaling Function and Wavelet Coefficients plus their Discrete Moments 


We have designed these “new" coiflet systems (e.g., WN = 10, 16, 22, 28) by using the Matlab 
optimization toolbox constrained optimization function. Wells and Tian [link] used Newton's method 
to design lengths N = 62+ 2 and N = 6¢ coiflets up to length 30 [link]. Selesnick [link] has used a 
filter design approach. Still another approach is given by Wei and Bovik [link]. 


[link] also shows the result of designing a length-4 system, using the one degree of freedom to ask for 
one zero scaling function moment rather than one zero wavelet moment as we did for the Daubechies 
system. For length-4, we do not get any “extra” zero moments because there are not enough zero 
wavelet moments. Here we see a direct trade-off between zero scaling function moments and wavelet 
moments. Adding these new lengths to our traditional coiflets gives [link]. 


N L m= 0 m,=0 m=0 m, =0 Total zero Holder 


set set* actual actual* moments exponent 
4 1 1 0 1 0 1 0.2075 
6 2 if 1 2 1 2 1.0137 
8 3 2 2 2 2 4 1.3887 
10 3 3 2 4 2 6 1.0909 
12 4 3 3 4 3 7 1.9294 
14 5 4 4 4 4 8 1.7309 
16 5 5 4 6 4 10 1.5558 
18 6 5 5 6 5 11 2.1859 
20 7 6 6 6 6 12 2.8531 
22 7 7 6 8 6 14 2.5190 
24 8 7 i 8 7 15 2.8300 
26 5 8 8 8 8 16 3.4404 
28 9 5 8 10 8 18 2.9734 


30 10 9 9 10 9 19 3.4083 


Moments for Various Length-N and Degree-L Coiflets, where (*) is the number of zero wavelet 
moments, excluding the m, (0) = 0 


The fourth and sixth columns in [link] contain the number of zero wavelet moments, excluding the 
m (0) = 0 which is zero because of orthogonality in all of these systems. The extra zero scaling 
function moments that occur after a nonzero moment for N = 62 + 2 are also excluded from the 
count. This table shows coiflets for all even lengths. It shows the extra zero scaling function moments 
that are sometime achieved and how the total number of zero moments monotonically increases and 
how the “smoothness" as measured by the Hélder exponent [link], [link], [link] increases with N and 
dD. 


When both scaling function and wavelet moments are set to zero, a larger number can be obtained than 
is expected from considering the degrees of freedom available. As stated earlier, of the NV degrees of 
freedom available from the N coefficients, h(7), one is used to insure existence of y(t) through the 
linear constraint [link], and N/2 are used to insure orthonormality through the quadratic constraints 
[link]. This leaves NV / 2 — 1 degrees of freedom to achieve other characteristics. Daubechies used 
these to set the first N/2 — 1 wavelet moments to zero. If setting scaling function moments were 
independent of setting wavelet moments zero, one would think that the coiflet system would allow 
(N/2 — 1)/2 wavelet moments to be set zero and the same number of scaling function moments. For 
the coiflets described in [link], one always obtains more than this. The structure of this problem allows 
more zero moments to be both set and achieved than the simple degrees of freedom would predict. In 
fact, the coiflets achieve approximately 2 /3 total zero moments as compared with the number of 
degrees of freedom which is approximately N /2, and which is achieved by the Daubechies wavelet 
system. 


As noted earlier and illustrated in [link], these coiflets fall into three classes. Those with scaling filter 
lengths of N = 6é + 2 (due to Tian) have equal number of zero scaling function and wavelet 
moments, but always has “extra" zero scaling function moments located after the first nonzero one. 
Lengths N = 6¢ (due to Daubechies) always have one more zero scaling function moment than zero 
wavelet moment and lengths N = 6 — 2 (new) always have two more zero scaling function moments 
than zero wavelet moments. These “extra" zero moments are predicted by [link] to [link], and there 
will be additional even-order zero moments for longer lengths. We have observed that within each of 
these classes, the Hdlder exponent increases monotonically. 


N m= 0t my = 0* Total zero 
Length achieved achieved moments 
N=6£+2 (N — 2)/3 (N —2)/3 (2/3)(N — 2) 
N=66 N/3 (N —3)/3 (2/3)(N — 3/2) 
N=6l-2 (N + 2)/3 (N —4)/3 (2/3)(N — 1) 


Number of Zero Moments for The Three Classes of Coiflets (¢ = 1, 2,---), *excluding ji (0) = 0, 


Texcluding Non-Contiguous zeros 


The approach taken in some investigations of coiflets would specify the coiflet degree and then find 
the shortest filter that would achieve that degree. The lengths N = 6 — 2 were not found by this 
approach because they have the same coiflet degree as the system just two shorter. However, they 
achieve two more zero scaling function moments than the shorter length with the same degree. By 
specifying the number of zero moments and/or the filter length, it is easier to see the complete picture. 


[link] is just part of a large collection of zero moment wavelet system designs with a wide variety of 
trade-offs that would be tailored to a particular application. In addition to the variety illustrated here, 
many (perhaps all) of these sets of specified zero moments have multiple solutions. This is certainly 
true for length-6 as illustrated in [link] through [link] and for other lengths that we have found 
experimentally. The variety of solutions for each length can have different shifts, different Holder 
exponents, and different degrees of being approximately symmetric. 


The results of this chapter and section show the importance of moments to the characteristics of 
scaling functions and wavelets. It may not, however, be necessary or important to use the exact criteria 
of Daubechies or Coifman, but understanding the effects of zero moments is very important. It may be 
that setting a few scaling function moments and a few wavelets moments may be sufficient with the 
remaining degrees of freedom used for some other optimization, either in the frequency domain or in 
the time domain. As is noted in the next section, an alternative might be to minimize a larger number 
of various moments rather than to zero a few [link]. 


Examples of the Coiflet Systems are shown in [link]. 


Minimization of Moments Rather than Zero Moments 


Odegard has considered the case of minimization of a larger number of moments rather than setting 
N / 2 — 1 equal to zero [link], [link], [link]. This results in some improvement in representing or 
approximating a larger class of signals at the expense of a better approximation of a smaller class. 
Indeed, Gotze [link] has shown that even in the designed zero moments wavelet systems, the 
implementation of the system in finite precision arithmetic results in nonzero moments and, in some 
cases, non-orthogonal systems. 


Generalizations of the Basic Multiresolution Wavelet System 


Up to this point in the book, we have developed the basic two-band wavelet system in some detail, 
trying to provide insight and intuition into this new mathematical tool. We will now develop a variety 
of interesting and valuable generalizations and extensions to the basic system, but in much less detail. 
We hope the detail of the earlier part of the book can be transferred to these generalizations and, 
together with the references, will provide an introduction to the topics. 


Tiling the Time—Frequency or Time—Scale Plane 


A qualitative descriptive presentation of the decomposition of a signal using wavelet systems or 
wavelet transforms consists of partitioning the time-scale plane into tiles according to the indices k 
and 7 defined in [link]. That is possible for orthogonal bases (or tight frames) because of Parseval's 
theorem. Indeed, it is Parseval's theorem that states that the signal energy can be partitioned on the 
time-scale plane. The shape and location of the tiles shows the logarithmic nature of the partitioning 
using basic wavelets and how the M-band systems or wavelet packets modify the basic picture. It also 
allows showing that the effects of time- or shift-varying wavelet systems, together with M-band and 
packets, can give an almost arbitrary partitioning of the plane. 


The energy in a signal is given in terms of the DWT by Parseval's relation in [link] or [link]. This 
shows the energy is a function of the translation index k and the scale index 7. 


Equation: 
/ fo (t) 


The wavelet transform allows analysis of a signal or parameterization of a signal that can locate 
energy in both the time and scale (or frequency) domain within the constraints of the uncertainty 
principle. The spectrogram used in speech analysis is an example of using the short-time Fourier 
transform to describe speech simultaneously in the time and frequency domains. 


dt = Ss. lee +>> > 14G, 4)’ 


1=—00 j=0 k=—0o 


This graphical or visual description of the partitioning of energy in a signal using tiling depends on 
the structure of the system, not the parameters of the system. In other words, the tiling partitioning 
will depend on whether one uses MZ = 2 or M = 3, whether one uses wavelet packets or time- 
varying wavelets, or whether one uses over-complete frame systems. It does not depend on the 
particular coefficients h(n) or h; (n), on the number of coefficients N, or the number of zero 
moments. One should remember that the tiling may look as if the indices 7 and k are continuous 
variables, but they are not. The energy is really a function of discrete variables in the DWT domain, 
and the boundaries of the tiles are symbolic of the partitioning. These tiling boundaries become more 
literal when the continuous wavelet transform (CWT) is used as described in [link], but even there it 
does not mean that the partitioned energy is literally confined to the tiles. 


Nonstationary Signal Analysis 


In many applications, one studies the decomposition of a signal in terms of basis functions. For 
example, stationary signals are decomposed into the Fourier basis using the Fourier transform. For 
nonstationary signals (i.e., signals whose frequency characteristics are time-varying like music, 


speech, images, etc.) the Fourier basis is ill-suited because of the poor time-localization. The classical 
solution to this problem is to use the short-time (or windowed) Fourier transform (STFT). However, 
the STFT has several problems, the most severe being the fixed time-frequency resolution of the basis 
functions. Wavelet techniques give a new class of (potentially signal dependent) bases that have 
desired time-frequency resolution properties. The “optimal” decomposition depends on the signal (or 
class of signals) studied. All classical time-frequency decompositions like the Discrete STFT 
(DSTFT), however, are signal independent. Each function in a basis can be considered schematically 
as a tile in the time-frequency plane, where most of its energy is concentrated. Orthonormality of the 
basis functions can be schematically captured by nonoverlapping tiles. With this assumption, the 
time-frequency tiles for the standard basis (i.e., delta basis) and the Fourier basis (i.e., sinusoidal 
basis) are shown in [link]. 


t 
(a) 


(a) Dirac Delta Function or Standard Time Domain Basis (b) Fourier or Standard Frequency 
Domain Basis 


Tiling with the Discrete-Time Short-Time Fourier Transform 


The DSTFT basis functions are of the form 
Equation: 


wk (t) = w(t — krp)e Vo" 


where w(t) is a window function [link]. If these functions form an orthogonal (orthonormal) basis, 
a(t) = >) 54 (2, Wj,e)W;,k (t). The DSTFT coefficients, (x, w;,~), estimate the presence of signal 


components centered at (79, j@ ) in the time-frequency plane, i.e., the DSTFT gives a uniform 
tiling of the time-frequency plane with the basis functions {w,,, (¢)}. If A; and A,, are time and 
frequency resolutions respectively of w(t), then the uncertainty principle demands that A;A,, < 1/2 
[link], [link]. Moreover, if the basis is orthonormal, the Balian-Low theorem implies either A; or A,, 
is infinite. Both A; and A,, can be controlled by the choice of w(t), but for any particular choice, 
there will be signals for which either the time or frequency resolution is not adequate. [link] shows 
the time-frequency tiles associated with the STFT basis for a narrow and wide window, illustrating 
the inherent time-frequency trade-offs associated with this basis. Notice that the tiling schematic 
holds for several choices of windows (i.e., each figure represents all DSTFT bases with the particular 
time-frequency resolution characteristic). 


t 


(a) 


(a) STFT Basis - Narrow Window. (b) STFT Basis - Wide Window. 


Tiling with the Discrete Two-Band Wavelet Transform 


The discrete wavelet transform (DWT) is another signal-independent tiling of the time-frequency 
plane suited for signals where high frequency signal components have shorter duration than low 
frequency signal components. Time-frequency atoms for the DWT, {wj,x (t)} = {24/*e (2/t — k) }, 
are obtained by translates and scales of the wavelet function w(t). One shrinks/stretches the wavelet 
to capture high-/low-frequency components of the signal. If these atoms form an orthonormal basis, 
then x (t) = 75 4(2, Yj,n) j,k (t). The DWT coefficients, (x, j,,), are a measure of the energy of 
the signal components located at (2 Ik, 23 ) in the time-frequency plane, giving yet another tiling of 
the time-frequency plane. As discussed in Chapter: Filter Banks and the Discrete Wavelet Transform 


and Chapter: Filter Banks and Transmultiplexers, the DWT (for compactly supported wavelets) can 
be efficiently computed using two-channel unitary FIR filter banks [link]. [link] shows the 
corresponding tiling description which illustrates time-frequency resolution properties of a DWT 
basis. If you look along the frequency (or scale) axis at some particular time (translation), you can 
imagine seeing the frequency response of the filter bank as shown in [link] with the logarithmic 
bandwidth of each channel. Indeed, each horizontal strip in the tiling of [link] corresponds to each 
channel, which in turn corresponds to a scale j. The location of the tiles corresponding to each 
coefficient is shown in [link]. If at a particular scale, you imagine the translations along the k axis, 
you see the construction of the components of a signal at that scale. This makes it obvious that at 
lower resolutions (smaller 7) the translations are large and at higher resolutions the translations are 
small. 


Two-band Wavelet Basis 


The tiling of the time-frequency plane is a powerful graphical method for understanding the 
properties of the DWT and for analyzing signals. For example, if the signal being analyzed were a 
single wavelet itself, of the form 

Equation: 


f(t) = p(4t — 2), 
the DWT would have only one nonzero coefficient, dz (2). To see that the DWT is not time (or shift) 


invariant, imagine shifting f(t) some noninteger amount and you see the DWT changes considerably. 
If the shift is some integer, the energy stays the same in each scale, but it “spreads out" along more 


values of & and spreads differently in each scale. If the shift is not an integer, the energy spreads in 
both 7 and k. There is no such thing as a “scale limited" signal corresponding to a band-limited 
(Fourier) signal if arbitrary shifting is allowed. For integer shifts, there is a corresponding concept 
Uink]. 


Relation of DWT Coefficients d;;, to Tiles 


General Tiling 


Notice that for general, nonstationary signal analysis, one desires methods for controlling the tiling of 
the time-frequency plane, not just using the two special cases above (their importance 
notwithstanding). An alternative way to obtain orthonormal wavelets y(t) is using unitary FIR filter 
bank (FB) theory. That will be done with M-band DWTs, wavelet packets, and time-varying wavelet 


Section: Wavelet Packets and Chapter: Filter Banks and Transmultiplexers respectively. 


Remember that the tiles represent the relative size of the translations and scale change. They do not 
literally mean the partitioned energy is confined to the tiles. Representations with similar tilings can 
have very different characteristics. 


Multiplicity-M (M-Band) Scaling Functions and Wavelets 


While the use of a scale multiplier / of two in [Link] or [link] fits many problems, coincides with the 
concept of an octave, gives a binary tree for the Mallat fast algorithm, and gives the constant-Q or 
logarithmic frequency bandwidths, the conditions given in Chapter: The Scaling Function and 
Scaling Coefficients, Wavelet and Wavelet Coefficients and Section: Further Properties of the Scaling 
Function and Wavelet can be stated and proved in a more general setting where the basic scaling 
equation [link], [link], [link], [link], [link], [link], [link] rather than the specific doubling value of 

M = 2. Part of the motivation for a larger M comes from a desire to have a more flexible tiling of 
the time-scale plane than that resulting from the J = 2 wavelet or the short-time Fourier transform 
discussed in Section: Tiling the Time—Frequency or Time—Scale Plane. It also comes from a desire for 
some regions of uniform band widths rather than the logarithmic spacing of the frequency responses 


from filter bank theory which is discussed in Chapter: Filter Banks and Transmultiplexers. 


We pose the more general multiresolution formulation where [link] becomes 
Equation: 


vie) = S h(n) VM » (Mz — n). 


In some cases, MM may be allowed to be a rational number; however, in most cases it must be an 
integer, and in [link] it is required to be 2. In the frequency domain, this relationship becomes 
Equation: 


5(o) = aH (0/M)#(0/M) 


and the limit after iteration is 
Equation: 


® (w) 


Nag" Ge) }* 


assuming the product converges and (0) is well defined. This is a generalization of [link] and is 
derived in [link]. 


Properties of M-Band Wavelet Systems 


These theorems, relationships, and properties are generalizations of those given in Chapter: The 
Scaling Function and Scaling Coefficients, Wavelet and Wavelet Coefficients and Section: Further 
Properties of the Scaling Function and Wavelet with some outline proofs or derivations given in the 
Appendix. For the multiplicity-// problem, if the support of the scaling function and wavelets and 
their respective coefficients is finite and the system is orthogonal or a tight frame, the length of the 
scaling function vector or filter h(n) is a multiple of the multiplier 1. This is N = M G, where 
Resnikoff and Wells [link] call M the rank of the system and G the genus. 


The results of [link], [link], [link], and [link] become 


Theorem 28 If y(t) is an L' solution to [link] and J e(t) dt #0, then 
Equation: 


So a(n) = VM. 


This is a generalization of the basic multiplicity-2 result in [link] and does not depend on any 
particular normalization or orthogonality of y(t). 


Theorem 29 If integer translates of the solution to [link] are orthogonal, then 
Equation: 


So h(n+Mm) h(n) = 5(m). 


This is a generalization of [link] and also does not depend on any normalization. An interesting 
corollary of this theorem is 


Corollary 3 If integer translates of the solution to [link] are orthogonal, then 


Equation: 
S>|a(n)|? =1. 


A second corollary to this theorem is 


Corollary 4 If integer translates of the solution to [link] are orthogonal, then 
Equation: 


Sh (Mn +m) =1/VM. meZ 


This is also true under weaker conditions than orthogonality as was discussed for the M = 2 case. 
Using the Fourier transform, the following relations can be derived: 


Theorem 30 If y(t) is an L' solution to [link] and Se(t) dt # 0, then 
Equation: 


H(0) = VM 


which is a frequency domain existence condition. 


Theorem 31 The integer translates of the solution to [link] are orthogonal if and only if 
Equation: 


S° | (@+ 2n8))? = 1 
£ 


Theorem 32 If y(t) is an L' solution to [link] and fet) dt # 0, then 
Equation: 


Soh (n+ Mm) h(n) = 6(m) 


if and only if 
Equation: 


|H (@)/? + |H (@ + 2n/M)|? + |H (@+40/M)|? +--- + |H(@+2n(M—1)/M) = M. 


This is a frequency domain orthogonality condition on h(n). 


Corollary 5 
Equation: 


H(27rl/M) = 0, for £=1,2,---,M-1 


which is a generalization of [link] stating where the zeros of H(q), the frequency response of the 
scaling filter, are located. This is an interesting constraint on just where certain zeros of H(z) must be 
located. 


Theorem 33 If >, h(n) = WM, and h(n) has finite support or decays fast enough, then a 
p(t)e L? that satisfies [link] exists and is unique. 


Theorem 34 If S<,, h(n) = VM and if >, h(n) h(n — Mk) = 6(k), then y(t) exists, is 
integrable, and generates a wavelet system that is a tight frame in L?. 


These results are a significant generalization of the basic MM = 2 wavelet system that we discussed in 
the earlier chapters. The definitions, properties, and generation of these more general scaling 
functions have the same form as for M = 2, but there is no longer a single wavelet associated with 
the scaling function. There are M/ — 1 wavelets. In addition to [link] we now have M — 1 wavelet 
equations, which we denote as 

Equation: 


for 
Equation: 


Some authors use a notation ho (n) for h(n) and @o (t) for w(t), so that he (n) represents the 
coefficients for the scaling function and all the wavelets and yy, (t) represents the scaling function 
and all the wavelets. 


Just as for the M = 2 case, the multiplicity-M scaling function and scaling coefficients are unique 
and are simply the solution of the basic recursive or refinement equation [link]. However, the 
wavelets and wavelet coefficients are no longer unique or easy to design in general. 


We now have the possibility of a more general and more flexible multiresolution expansion system 
with the M-band scaling function and wavelets. There are now MM — 1 signal spaces spanned by the 
M — 1 wavelets at each scale 7. They are denoted 

Equation: 


Wij = Span} vs (M't + k) 
k 


for £=1,2,---,M —1. For example with M = 4, 


Equation: 

Vi = Vo 6 Wino B W209 G W3,9 
and 
Equation: 

Vo = VOW 8 Wai 8 W314 
or 
Equation: 


V2 = Vo BWi9 © W209 B W390 SB Wi © Wo 8 W311. 


In the limit as 7 > 00, we have 
Equation: 


L? = Vo ® Win © Wao © W390 © Wii © W21 © W31 @- +: @ W3.0.- 


Our notation for M = 2 in Chapter: A multiresolution formulation of Wavelet Systems is 
Wiz = W; 


This is illustrated pictorially in [link] where we see the nested scaling function spaces V; but each 
annular ring is now divided into M — 1 subspaces, each spanned by the M — 1 wavelets at that 
scale. Compare [link] with Figure: Scaling Function and Wavelet Vector Spaces for the classical 
M = 2 case. 


Vector Space Decomposition for a Four- 
Band Wavelet System, W2; 


The expansion of a signal or function in terms of the M-band wavelets now involves a triple sum over 
£, 7, and k. 
Equation: 


M-1 


f(t) = Soe(k) ye (t) + 3 3 M3? de; (k) we (M%t — k) 


k k=—oo j é=1 


ll 
=) 


where the expansion coefficients (DWT) are found by 
Equation: 


and 
Equation: 


\= [ro M3? hy, (Mit — k) dt. 


We now have an M-band discrete wavelet transform. 


Theorem 35 If the scaling function y(t) satisfies the conditions for existence and orthogonality and 
the wavelets are defined by [link] and if the integer translates of these wavelets span W: the 
orthogonal compliments of Vo, all being in Vj, i.e., the wavelets are orthogonal to the scaling 
function at the same scale; that is, if 

Equation: 


[et—n)ue(t~m) dt =0 


for £=1,2,---,M —1, then 
Equation: 


S > h(n) he(n — Mk) = 0 


n 


for all integers k and for = 1,2,---,M—1. 


Combining [link] and [link] and calling ho (n) = h(n) gives 
Equation: 


S\ Am (n) he (1 — Mk) = 5(k) 5(m — £) 


as necessary conditions on hy (n) for an orthogonal system. 


Wo 


Filter Bank Structure for a Four-Band Wavelet System, W2; 


Unlike the M = 2 case, for M > 2 there is no formula for he (n) and there are many possible 
wavelets for a given scaling function. 


Mallat's algorithm takes on a more complex form as shown in [link]. The advantage is a more flexible 
system that allows a mixture of linear and logarithmic tiling of the time-scale plane. A powerful tool 
that removes the ambiguity is choosing the wavelets by “modulated cosine" design. 


[link] shows the frequency response of the filter band, much as Figure: Frequency Bands for the 
Analysis Tree did for M = 2. Examples of scaling functions and wavelets are illustrated in [link], 
and the tiling of the time-scale plane is shown in [link]. [link] shows the time-frequency resolution 
characteristics of a four-band DWT basis. Notice how it is different from the Standard, Fourier, 
DSTFT and two-band DWT bases shown in earlier chapters. It gives a mixture of a logarithmic and 
linear frequency resolution. 
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Frequency Responses for the Four-Band Filter Bank, W2; 
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A Four-Band Six-Regular Wavelet System: ® 


4-band Wavelet Basis Tiling 


We next define the k“” moments of wz (t) as 
Equation: 


m= / t* aby (t) dt 


and the k’” discrete moments of hy (n) as 
Equation: 


pie (k) = Son" he (n). 


Theorem 36 (Equivalent Characterizations of K-Regular M-Band Filters) A unitary scaling filter 
is K-regular if and only if the following equivalent statements are true: 


1. All moments of the wavelet filters are zero, te (k) = 0, for k = 0,1,---,(& — 1) and for 
£=1,2,---,(M-—1) 

2. All moments of the wavelets are zero, m, (k) = 0, fork = 0,1,---,(K — 1) and for 
£=1,2,---,(M-—1) 

3. The partial moments of the scaling filter are equal for k = 0,1,---,(K —1) 

4. The frequency response of the scaling filter has zeros of order K at the M* roots of unity, 
@ = 2 l/M for é=1,2,---,M —1. 

5. The magnitude-squared frequency response of the scaling filter is flat to order 2K at wo = 0. This 
follows from [link]. 


6. All polynomial sequences up to degree (K — 1) can be expressed as a linear combination of 
integer-shifted scaling filters. 

7. All polynomials of degree up to (K — 1) can be expressed as a linear combination of integer- 
shifted scaling functions for all 3. 


This powerful result [link], [link] is similar to the IM = 2 case presented in Chapter: Regularity, 
Moments, and Wavelet System Design . It not only ties the number of zero moments to the regularity 
but also to the degree of polynomials that can be exactly represented by a sum of weighted and 
shifted scaling functions. Note the location of the zeros of H(z) are equally spaced around the unit 
circle, resulting in a narrower frequency response than for the half-band filters if J = 2. This is 
consistent with the requirements given in [link] and illustrated in [link]. 


Sketches of some of the derivations in this section are given in the Appendix or are simple extensions 
of the M = 2 case. More details are given in [link], [link], [link]. 


M-Band Scaling Function Design 


Calculating values of y(n) can be done by the same methods given in Section: Calculating the Basic 
Scaling Function and Wavelet. However, the design of the scaling coefficients h(n) parallels that for 
the two-band case but is somewhat more difficult [link]. 


One special set of cases turns out to be a simple extension of the two-band system. If the multiplier 
M = 2”, then the scaling function is simply a scaled version of the M = 2 case and a particular set 
of corresponding wavelets are those obtained by iterating the wavelet branches of the Mallat 
algorithm tree as is done for wavelet packets described in [link]. For other values of M, especially 
odd values, the situation is more complex. 


M-Band Wavelet Design and Cosine Modulated Methods 


For M > 2 the wavelet coefficients hy (n) are not uniquely determined by the scaling coefficients, as 
was the case for M = 2. This is both a blessing and a curse. It gives us more flexibility in designing 
specific systems, but it complicates the design considerably. For small N and WM, the designs can be 
done directly, but for longer lengths and/or for large M/, direct design becomes impossible and 
something like the cosine modulated design of the wavelets from the scaling function as described in 
Chapter: Filter Banks and Transmultiplexers, is probably the best approach [link], [link], [link], 
Uink], [link ][link], [link], [link], [link], [Link], [link], [link], [link], [link]. 


Wavelet Packets 


The classical M = 2 wavelet system results in a logarithmic frequency resolution. The low 
frequencies have narrow bandwidths and the high frequencies have wide bandwidths, as illustrated in 
appropriate for some applications but not all. The wavelet packet system was proposed by Ronald 
Coifman [link], [link] to allow a finer and adjustable resolution of frequencies at high frequencies. It 
also gives a rich structure that allows adaptation to particular signals or signal classes. The cost of this 


richer structure is a computational complexity of O(N log (V)), similar to the FFT, in contrast to the 
classical wavelet transform which is O(V). 


Full Wavelet Packet Decomposition 


In order to generate a basis system that would allow a higher resolution decomposition at high 
frequencies, we will iterate (split and down-sample) the highpass wavelet branch of the Mallat 
algorithm tree as well as the lowpass scaling function branch. Recall that for the discrete wavelet 
transform we repeatedly split, filter, and decimate the lowpass bands. The resulting three-scale 
analysis tree (three-stage filter bank) is shown in Figure: Three-Stage Two-Band Analysis Tree. This 
type of tree results in a logarithmic splitting of the bandwidths and tiling of the time-scale plane, as 
shown in [link]. 


If we split both the lowpass and highpass bands at all stages, the resulting filter bank structure is like 
a full binary tree as in [link]. It is this full tree that takes O(N log NV) calculations and results in a 
completely evenly spaced frequency resolution. In fact, its structure is somewhat similar to the FFT 
algorithm. Notice the meaning of the subscripts on the signal spaces. The first integer subscript is the 
scale 7 of that space as illustrated in [link]. Each following subscript is a zero or one, depending the 
path taken through the filter bank illustrated in Figure: Three-Stage Two-Band Analysis Tree, A 
“zero" indicates going through a lowpass filter (scaling function decomposition) and a “one" indicates 
going through a highpass filter (wavelet decomposition). This is different from the convention for the 
M > 2 case in [link]. 
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The full binary tree for the three-scale wavelet packet transform. 


[link] pictorially shows the signal vector space decomposition for the scaling functions and wavelets. 
[link] shows the frequency response of the packet filter bank much as Figure: Frequency Bands for 
the Analysis Tree did for M = 2 and [link] for M = 3 wavelet systems. 


[link] shows the Haar wavelet packets with which we finish the example started in Section: An 
Example of the haar Wavelet System. This is an informative illustration that shows just what 
“packetizing" does to the regular wavelet system. It should be compared to the example at the end 

of Chapter: A multiresolution formulation of Wavelet Systems. This is similar to the Walsh- 
Haddamar decomposition, and [link] shows the full wavelet packet system generated from the 
Daubechies y pg scaling function. The “prime" indicates this is the Daubechies system with the 
spectral factorization chosen such that zeros are inside the unit circle and some outside. This gives the 
maximum symmetry possible with a Daubechies system. Notice the three wavelets have increasing 
“frequency.” They are somewhat like windowed sinusoids, hence the name, wavelet packet. Compare 
the wavelets with the M = 2 and M = 4 Daubechies wavelets. 


Adaptive Wavelet Packet Systems 


Normally we consider the outputs of each channel or band as the wavelet transform and from this 
have a nonredundant basis system. If, however, we consider the signals at the output of each band and 
at each stage or scale simultaneously, we have more outputs than inputs and clearly have a redundant 
system. From all of these outputs, we can choose an independent subset as a basis. This can be done 
in an adaptive way, depending on the signal characteristics according to some optimization criterion. 
One possibility is the regular wavelet decomposition shown in Figure: Frequency Bands for the 
Analysis Tree. 


Vector Space Decomposition fora M = 2 
Full Wavelet Packet System 


|H(w)| 


Wo Wino Wirt Weaoo Wao. Wain Wai 


Frequency Responses for the Two-Band Wavelet Packet Filter Bank 


Another is the full packet decomposition shown in [link]. Any pruning of this full tree would generate 
a valid packet basis system and would allow a very flexible tiling of the time-scale plane. 


We can choose a set of basic vectors and form an orthonormal basis, such that some cost measure on 
the transformed coefficients is minimized. Moreover, when the cost is additive, the 
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The Haar Wavelet Packet 


best orthonormal wavelet packet transform can be found using a binary searching algorithm [link] in 
O(N log N) time. 


Some examples of the resulting time-frequency tilings are shown in [link]. These plots demonstrate 
the frequency adaptation power of the wavelet packet transform. 


(a) (b) 


Examples of Time-Frequency Tilings of Different Three-Scale 
Orthonormal Wavelet Packet Transforms. 


There are two approaches to using adaptive wavelet packets. One is to choose a particular 
decomposition (filter bank pruning) based on the characteristics of the class of signals to be 
processed, then to use the transform nonadaptively on the individual signals. The other is to adapt the 
decomposition for each individual signal. The first is a linear process over the class of signals. The 
second is not and will not obey superposition. 


Let P(J) denote the number of different J-scale orthonormal wavelet packet transforms. We can 
easily see that 
Equation: 


P(J) = P(J—1) +1, P(1) =1. 


So the number of possible choices grows dramatically as the scale increases. This is another reason 
for the wavelet packets to be a very powerful tool in practice. For example, the FBI standard for 
fingerprint image compression [link], [link] is based on wavelet packet transforms. The wavelet 
packets are successfully used for acoustic signal compression [link]. In [link], a rate-distortion 
measure is used with the wavelet packet transform to improve image compression performance. 


M-band DWTs give a flexible tiling of the time-frequency plane. They are associated with a 
particular tree-structured filter bank, where the lowpass channel at any depth is split into M/Z bands. 
Combining the M-band and wavelet packet structure gives a rather arbitrary tree-structured filter 
bank, where all channels are split into sub-channels (using filter banks with a potentially different 
number of bands), and would give a very flexible signal decomposition. The wavelet analog of this is 
known as the wavelet packet decomposition [link]. For a given signal or class of signals, one can, for 
a fixed set of filters, obtain the best (in some sense) filter bank tree-topology. For a binary tree an 
efficient scheme using entropy as the criterion has been developed—the best wavelet packet basis 
algorithm [link], [link]. 


Biorthogonal Wavelet Systems 


Requiring the wavelet expansion system to be orthogonal across both translations and scale gives a 
clean, robust, and symmetric formulation with a Parseval's theorem. It also places strong limitations 
on the possibilities of the system. Requiring orthogonality uses up a large number of the degrees of 


freedom, results in complicated design equations, prevents linear phase analysis and synthesis filter 
banks, and prevents asymmetric analysis and synthesis systems. This section will develop the 
biorthogonal wavelet system using a nonorthogonal basis and dual basis to allow greater flexibility in 
achieving other goals at the expense of the energy partitioning property that Parseval's theorem states 
[link], [link], [link], [link], [link], [link], [link], [link], [link], [link]. Some researchers have 
considered “almost orthogonal" systems where there is some relaxation of the orthogonal constraints 
in order to improve other characteristics [link]. Indeed, many image compression schemes (including 
the fingerprint compression used by the FBI [link], [link]) use biorthogonal systems. 


Two Channel Biorthogonal Filter Banks 


In previous chapters for orthogonal wavelets, the analysis filters and synthesis filters are time reversal 
of each other; i.e., h(n) = h(—n), §(n) = g (—n). Here, for the biorthogonal case, we relax these 
restrictions. However, in order to perfectly reconstruct the input, these four filters still have to satisfy 
a set of relations. 


c1(n) ci(m) 


Two Channel Biorthogonal Filter Banks 


Let c; (n), € Z be the input to the filter banks in [link], then the outputs of the analysis filter banks 


are 
Equation: 


co(k) = A(2k—njer(n), do (k) = $°g (2k — n)ex (n). 


The output of the synthesis filter bank is 
Equation: 


é1(m) =) oh (2k — m)eo (k) + g (2k — m)do (A)]. 


k 


Substituting Equation [link] into [link] and interchanging the summations gives 
Equation: 


é (m) = yA (2k — m)h (2k — n) + g (2k — m)g (2k — n)|e1 (n). 
n k 


For perfect reconstruction, i.e., €; (m) = c, (m), Vm € Z, we need 
Equation: 


>> [fh (2k — mh (2k — n) + g (2k — m)g (2k — 2) = 5(m—n). 
k 


Fortunately, this condition can be greatly simplified. In order for it to hold, the four filters have to be 
related as [link] 
Equation: 


g(n)=(-1)"h(1—n), — g(n) = (-1)"A(1 —n), 


up to some constant factors. Thus they are cross-related by time reversal and flipping signs of every 


other element. Clearly, when h=h, we get the familiar relations between the scaling coefficients and 
the wavelet coefficients for orthogonal wavelets, g(n) = (—1)"h (1 — n). Substituting [link] back to 
[link], we get 

Equation: 


Sh (n)h (n+ 2k) = 6(k). 


In the orthogonal case, we have 5°, h (n)h (n + 2k) = 6(k); i-e., h(n) is orthogonal to even 


translations of itself. Here h is orthogonal to h, thus the name biorthogonal. 


Equation [link] is the key to the understanding of the biorthogonal filter banks. Let's assume h (n) is 


nonzero when Ni <n< No, and h(n) is nonzero when Ny < n < Ng. Equation [link] implies that 
[link] 
Equation: 


N.—N,=2k+1, No—N, =2k4+1, k,k € Z. 


In the orthogonal case, this reduces to the well-known fact that the length of h has to be even. [link] 


also imply that the difference between the lengths of hand h must be even. Thus their lengths must 
be both even or both odd. 


Biorthogonal Wavelets 


We now look at the scaling function and wavelet to see how removing orthogonality and introducing 
a dual basis changes their characteristics. We start again with the basic multiresolution definition of 
the scaling function and add to that a similar definition of a dual scaling function. 


Equation: 


=e) )V26 (2t — n), 


Equation: 


= Dh(n) )V26 (2t — n). 


From Theorem [link] in Chapter: The Scaling Function and Scaling Coefficients, Wavelet and 
Wavelet Coefficients , we know that for y and ¢ to exist, 
Equation: 


S h(n) = Soh(n) = V2 


Continuing to parallel the construction of the orthogonal wavelets, we also define the wavelet and the 
dual wavelet as 
Equation: 


= 20) )V26 (2t —n) = S> (-1)"h (1 —n)V26 (2t — n), 


n 


Equation: 


= 2580) )V26 (2t — n) = S° (-1)"h (1 — n)V26 (2t — n). 


n 


Now that we have the scaling and wavelet functions and their duals, the question becomes whether 
we can expand and reconstruct arbitrary functions using them. The following theorem [link] answers 
this important question. 


Theorem 37 For h and h satisfying [link], suppose that for some C, € > 0, 
Equation: 


|B(o)| < Cte)", |(o)|<CA+o)1*. 


If ® and & defined above have sufficient decay in the frequency domain, then 
def _. : 
Vin = 25/24) (a = k), j,k € Z constitute a frame in L? (R). Their dual frame is given by 


os def .,. - 3 
Die = Wp (2% —k), j,k € Z; for any f € L? (R), 
Equation: 


f=) (ede = DF dix Pie 


jk j,k 


where the series converge strongly. 


Moreover, the ;, and Wyk constitute two Riesz bases, with 
Equation: 


(Pans byw) = 6(j-j7')6(k-F’) 


if and only if 
Equation: 


[e@ee-¥ de 8 @: 


This theorem tells us that under some technical conditions, we can expand functions using the 
wavelets and reconstruct using their duals. The multiresolution formulations in Chapter: A 
multiresolution formulation of Wavelet Systems can be revised as 


Equation: 

-/CV_g2CVyA~CYUYWCVYCWC-::: 
Equation: 

mvE Vee V4e MeN Gs 
where 
Equation: 


V; = Span {®,;,}, V; = Span {ix}. 
k k 
If [link] holds, we have 
Equation: 


V; LW, V; 1 W;, 


where 
Equation: 
W;= open {births Wr= Span {dix}. 


Although W;; is not the orthogonal complement to V; in V;,1 as before, the dual space W; plays the 
much needed role. Thus we have four sets of spaces that form two hierarchies to span L? (R). 


In Section: Further Properties of the Scaling Function and Wavelet, we have a list of properties of the 
scaling function and wavelet that do not require orthogonality. The results for regularity and moments 
in Chapter: Regularity, Moments, and Wavelet System Design can also be generalized to the 
biorthogonal systems. 


Comparisons of Orthogonal and Biorthogonal Wavelets 


The biorthogonal wavelet systems generalize the classical orthogonal wavelet systems. They are 
more flexible and generally easy to design. The differences between the orthogonal and biorthogonal 
wavelet systems can be summarized as follows. 


¢ The orthogonal wavelets filter and scaling filter must be of the same length, and the length must 
be even. This restriction has been greatly relaxed for biorthogonal systems. 

¢ Symmetric wavelets and scaling functions are possible in the framework of biorthogonal 
wavelets. Actually, this is one of the main reasons to choose biorthogonal wavelets over the 
orthogonal ones. 

¢ Parseval's theorem no longer holds in biorthogonal wavelet systems; i.e., the norm of the 
coefficients is not the same as the norm of the functions being spanned. This is one of the main 
disadvantages of using the biorthogonal systems. Many design efforts have been devoted to 
making the systems near orthogonal, so that the norms are close. 

¢ Ina biorthogonal system, if we switch the roles of the primary and the dual, the overall system is 
still sound. Thus we can choose the best arrangement for our application. For example, in image 
compression, we would like to use the smoother one of the pair to reconstruct the coded image 
to get better visual appearance. 

e In statistical signal processing, white Gaussian noise remains white after orthogonal transforms. 
If the transforms are nonorthogonal, the noise becomes correlated or colored. Thus, when 
biorthogonal wavelets are used in estimation and detection, we might need to adjust the 
algorithm to better address the colored noise. 


Example Families of Biorthogonal Systems 


Because biorthogonal wavelet systems are very flexible, there are a wide variety of approaches to 
design different biorthogonal systems. The key is to design a pair of filters h and h that satisfy [link] 
and [link] and have other desirable characteristics. Here we review several families of biorthogonal 
wavelets and discuss their properties and design methods. 


Cohen-Daubechies-Feauveau Family of Biorthogonal Spline Wavelets 


Splines have been widely used in approximation theory and numerical algorithms. Therefore, they 
may be desirable scaling functions, since they are symmetric, smooth, and have dyadic filter 
coefficients (see Section: Example Scaling Functions and Wavelets). However, if we use them as 
scaling functions in orthogonal wavelet systems, the wavelets have to have infinite support [link]. On 
the other hand, it is very easy to use splines in biorthogonal wavelet systems. Choose fh to be a filter 


that can generate splines, then [link] and [link] are linear in the coefficients of h. Thus we only have 


to solve a set of linear equations to get h, and the resulting halso have dyadic coefficients. In [link], 
better methods are used to solve these equations indirectly. 


The filter coefficients for some members of the Cohen-Daubechies-Feauveau family of biorthogonal 


spline wavelets are listed in [link]. Note that they are symmetric. It has been shown that as the length 
increases, the regularity of y and ¢ of this family also increases [link]. 


h/V/2 h/V2 


1/21/2 —1/16, 1/16, 1/2,1/16, —1/16 
1/4,1/2,1/4 ~1/8, 1/4, 3/4, 1/4, —1/8 
1/8, 3/8, 3/8, 1/8 —5/512, 15/512, 19/512, —97/512, —13/256, 175/256, --- 


Coefficients for Some Members of Cohen-Daubechies-Feauveau Family of Biorthogonal Spline 
Wavelets (For longer filters, we only list half of the coefficients) 


Cohen-Daubechies-Feauveau Family of Biorthogonal Wavelets with Less Dissimilar Filter 
Length 


The Cohen-Daubechies-Feauveau family of biorthogonal wavelets are perhaps the most widely used 
biorthogonal wavelets, since the scaling function and wavelet are symmetric and have similar lengths. 
A member of the family is used in the FBI fingerprint compression standard [link], [link]. The design 
method for this family is remarkably simple and elegant. 


In the frequency domain, [link] can be written as 
Equation: 


* 


H(o)H (o)+H(o+n)H (o+7m) = 2. 


Recall from Chapter: Regularity, Moments, and Wavelet System Design that we have an explicit 
solution for |H («)|* = M (w) such that 
Equation: 


M(o)+ M(@+7) = 2, 


and the resulting compactly supported orthogonal wavelet has the maximum number of zero 
moments possible for its length. In the orthogonal case, we get a scaling filter by factoring M(@) as 


H (@)H™ (a). Here in the biorthogonal case, we can factor the same M(a) to get H(@) and A («). 


Factorizations that lead to symmetric A and h with similar lengths have been found in [link], and their 
coefficients are listed in [link]. Plots of the scaling and wavelet functions, which are members of the 
family used in the FBI fingerprint compression standard, are in [link]. 


h h 
0.85269867900889 0.78848561640637 
0.37740285561283 0.41809227322204 
-0.11062440441844 -0.04068941760920 
-0.02384946501956 -0.06453888262876 
0.03782845550726 


Coefficients for One of the Cohen-Daubechies-Feauveau Family of Biorthogonal Wavelets that is 
Used in the FBI Fingerprint Compression Standard (We only list half of the coefficients) 


Tian-Wells Family of Biorthogonal Coiflets 


The coiflet system is a family of compactly supported orthogonal wavelets with zero moments of 
both the scaling functions and wavelets described in Section: Coiflets and Related Wavelet Systems. 
Compared with Daubechies' wavelets with only zero wavelet moments, the coiflets are more 
symmetrical and may have better approximation properties when sampled data are used. However, 
finding the orthogonal coiflets involves solving a set of nonlinear equations. No closed form solutions 
have been found, and when the length increases, numerically solving these equations becomes less 
stable. 


Tian and Wells [link], [link] have constructed biorthogonal wavelet systems with both zero scaling 
function and wavelet moments. Closed form solutions for these biorthogonal coiflets have been 
found. They have approximation properties similar to the coiflets, and the filter coefficients are 
dyadic rationals as are the splines. The filter coefficients for these biorthogonal Coiflets are listed in 
[link]. Some members of this family are also in the spline family described earlier. 


Lifting Construction of Biorthogonal Systems 


We have introduced several families of biorthogonal systems and their design methods. There is 
another method called a lifting scheme, which is very simple and general. It has a long history 
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Plots of Scaling Function and Wavelet and their Duals for one of the Cohen-Daubechies- 
Feauveau Family of Biorthogonal Wavelets that is Used in the FBI Fingerprint Compression 
Standard 


{link], [link], [link], [link], [link], [link], and has been systematically developed recently [link], [link]. 
The key idea is to build complicated biorthogonal systems using simple and invertible stages. The 
first stage does nothing but to separate even and odd samples, and it is easily invertible. The structure 
is shown in [link], and is called the lazy wavelet transform in [link]. 


V2h V2h 
i, it 
1/2,1,1/2 1/4, 1/2,3/2,1/2,—-1/4 


3/8,1,3/4, 0, -1/8 3/64, 0, —3/16, 3/8, 41/32, 3/4, 3/16, —1/8, 3/64 


—1/16,0,9/16,1,9/16,0,—-1/16  —1/256,0, 9/128, —1/16, —63/256, 9/16, 87/64, --- 


Coefficients for some Members of the Biorthogonal Coiflets. For Longer Filters, We only List Half of 
the Coefficients. 


even 


odd 


The Lazy Wavelet Transform 


After splitting the data into two parts, we can predict one part from the other, and keep only the 
prediction error, as in [link]. We can reconstruct the data by recomputing the prediction and then add 
back the prediction. In [link], s and ¢ are prediction filters. 


By concatenating simple stages, we can implement the forward and inverse wavelet transforms as in 
[link]. It is also called the ladder structure, and the reason for the name is clear from the figure. 
Clearly, the system is invertible, and thus biorthogonal. Moreover, it has been shown the orthogonal 
wavelet systems can also be implemented using lifting [link]. The advantages of lifting are numerous: 


Lifting steps can be calculated inplace. As seen in [link], the prediction outputs based on one 
channel of the data can be added to or subtracted from the data in other channels, and the results 
can be saved in the same place in the second channel. No auxiliary memory is needed. 

The predictors s and t do not have to be linear. Nonlinear operations like the medium filter or 
rounding can be used, and the system remains invertible. This allows a very simple 
generalization to nonlinear wavelet transform or nonlinear multiresolution analysis. 

The design of biorthogonal systems boils down to the design of the predictors. This may lead to 
simple approaches that do not relay on the Fourier transform [link], and can be generalized to 
irregular samples or manifolds. 

For biorthogonal systems, the lifting implementations require less numerical operations than 
direct implementations [link]. For orthogonal cases, the lifting schemes have the computational 
complexity similar to the lattice factorizations, which is almost half of the direct 
implementation. 


highpass highpass 
lowpass lowpass 


(a) The Lifting Step (b) The Dual Lifting Step 


The Lifting and Dual Lifting Step 


(b) The Inverse Wavelet Transform using Lifting 


Wavelet Transform using Lifting 


Multiwavelets 


In Chapter: A multiresolution formulation of Wavelet Systems, we introduced the multiresolution 
analysis for the space of L? functions, where we have a set of nesting subspaces 
Equation: 


{0} C++» C Vg CVAChWCUChC:::cL’, 


where each subspace is spanned by translations of scaled versions of a single scaling function 9; e.g., 
Equation: 


V; = Span {24¢ (2’t — k) I 


The direct difference between nesting subspaces are spanned by translations of a single wavelet at the 
corresponding scale; e.g., 
Equation: 


W3 = Visi © Vj = Span {24/4 (24t — k)}. 
k 


There are several limitations of this construction. For example, nontrivial orthogonal wavelets can not 
be symmetric. To avoid this problem, we generalized the basic construction, and introduced 
multiplicity- (//-band) scaling functions and wavelets in [link], where the difference spaces are 
spanned by translations of IM —1 wavelets. The scaling is in terms of the power of M; i.e., 

Equation: 


Px (t) = MI? (Mit — k). 


In general, there are more degrees of freedom to design the M-band wavelets. However, the nested V 
spaces are still spanned by translations of a single scaling function. It is the multiwavelets that 
removes the above restriction, thus allowing multiple scaling functions to span the nested V spaces 
[link], [link], [link]. Although it is possible to construct M-band multiwavelets, here we only present 
results on the two-band case, as most of the researches in the literature do. 


Construction of Two-Band Multiwavelets 


Assume that Vo is spanned by translations of R different scaling functions y; (t),7 = 1,..., R. Fora 
two-band system, we define the scaling and translation of these functions by 
Equation: 


Gist (t) = 27/2; (2%t — k). 


The multiresolution formulation implies 
Equation: 


YS een {p54 (bE) 24 1 Rs 


We next construct a vector scaling function by 
Equation: 


# (t) = [p1 (0), pr Ql’. 


Since Vo C Vj, we have 
Equation: 


&(t)=V2 S~ H(n)&(2t—n) 


where H(k) isa R x R matrix for each k € Z. This is a matrix version of the scalar recursive 
equation [link]. The first and simplest multiscaling functions probably appear in [link], and they are 
shown in [link]. 


gi (t) a(t) 


The Simplest Alpert 
Multiscaling Functions 


The first scaling function y, (t) is nothing but the Haar scaling function, and it is the sum of two 
time-compressed and shifted versions of itself, as shown in [link](a). The second scaling function can 
be easily decomposed into linear combinations of time-compressed and shifted versions of the Haar 
scaling function and itself, as 

Equation: 


a(t) = “Spi (at) + Lon 21) - YS, 1) + Lyn ae 0). 


This is shown in [link] 
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Multiwavelet Refinement Equation [link] 


Putting the two scaling functions together, we have 


Equation: 
to] > Lvs alleen] * Lvaye valleae— 


Further assume R wavelets span the difference spaces; i.e., 
Equation: 


W; = Vu OV;= pe {Wijk (t) pay ee eee R}. 


Since Wo C Vj for the stacked wavelets Y(t) there must exist a sequence of R x R matrices G(k), 
such that 
Equation: 


These are vector versions of the two scale recursive equations [link] and [link]. 


We can also define the discrete-time Fourier transform of H(k) and G(k) as 
Equation: 


H (a) = of (ke, = G(a) = 28 (ke, 


Properties of Multiwavelets 


Approximation, Regularity and Smoothness 


Recall from Chapter: Regularity, Moments, and Wavelet System Design that the key to regularity and 
smoothness is having enough number of zeros at for H(@). For multiwavelets, it has been shown 
that polynomials can be exactly reproduced by translates of ®(t) if and only if H() can be factored 
in special form [link], [link], [link]. The factorization is used to study the regularity and convergence 
of refinable function vectors [link], and to construct multi-scaling functions with approximation and 
symmetry [link]. Approximation and smoothness of multiple refinable functions are also studied in 
Uink], [link], [link]. 


Support 


In general, the finite length of H(k) and G(k) ensure the finite support of ®(t) and Y(t). However, 
there are no straightforward relations between the support length and the number of nonzero 
coefficients in H(k) and G(k). An explanation is the existence of nilpotent matrices [link]. A method 
to estimate the support is developed in [link]. 


Orthogonality 


For these scaling functions and wavelets to be orthogonal to each other and orthogonal to their 
translations, we need [link] 


Equation: 

H (o)H' (o) + H(@+7)H! (o+ 7) = Ip, 
Equation: 

G (0)G! (®) + G(@ + 7)GI (o +7) = Ip, 
Equation: 


H (@)G! () + H(@+ 7)G! (@+ 7) = Op, 


where { denotes the complex conjugate transpose, Iz and Op are the R x R identity and zero matrix 
respectively. These are the matrix versions of [link] and [link]. In the scalar case, [link] can be easily 
satisfied if we choose the wavelet filter by time-reversing the scaling filter and changing the signs of 
every other coefficients. However, for the matrix case here, since matrices do not commute in 
general, we cannot derive the G(k)'s from H(k)'s so straightforwardly. This presents some difficulty 
in finding the wavelets from the scaling functions; however, this also gives us flexibility to design 
different wavelets even if the scaling functions are fixed [link]. 


The conditions in [link]—[link] are necessary but not sufficient. Generalization of Lawton's sufficient 
condition (Theorem Theorem 14 in Chapter: The Scaling Function and Scaling Coefficients, Wavelet 
and Wavelet Coefficients ) has been developed in [link], [link], [link]. 


Implementation of Multiwavelet Transform 


Let the expansion coefficients of multiscaling functions and multiwavelets be 
Equation: 


cig (k) = (f(t), vise (4); 
Equation: 


dis (k) = (f(t), Visa (2). 


We create vectors by 
Equation: 

C; (k) = [erg (R), erg (AD 
Equation: 

D; (k) = [dis (k),-- dry (A) 


For f(t) in Vo, it can be written as linear combinations of scaling functions and wavelets, 
Equation: 


F(t) = S2Cj (BY Bya t) + ST Dib) Hyp (0). 
k 


jJ=jo k 


Using [link] and [link], we have 
Equation: 


Cjy-1 (k) = V2 N° A (n)C; (2k +n) 


and 
Equation: 


Dj-1(k) = V2$_ G (n)C;j (2k + n). 


C5 (k) 


Dj-1(k) 


Discrete Multiwavelet Transform 


Moreover, 
Equation: 


C;(k) = V2S~ (Hn)'C;1 (2k +n) + G(k)'D;-1 (2k + n)). 
k 


These are the vector forms of [link], [link], and [link]. Thus the synthesis and analysis filter banks for 
multiwavelet transforms have similar structures as the scalar case. The difference is that the filter 
banks operate on blocks of R inputs and the filtering and rate-changing are all done in terms of 
blocks of inputs. 


To start the multiwavelet transform, we need to get the scaling coefficients at high resolution. Recall 
that in the scalar case, the scaling functions are close to delta functions at very high resolution, so the 
samples of the function are used as the scaling coefficients. However, for multiwavelets we need the 
expansion coefficients for R scaling functions. Simply using nearby samples as the scaling 
coefficients is a bad choice. Data samples need to be preprocessed (prefiltered) to produce reasonable 
values of the expansion coefficients for multi-scaling function at the highest scale. Prefilters have 
been designed based on interpolation [link], approximation [link], and orthogonal projection [link]. 


Examples 


Because of the larger degree of freedom, many methods for constructing multiwavelets have been 
developed. 


Geronimo-Hardin-Massopust Multiwavelets 


A set of multiscaling filters based on fractal interpolation functions were developed in [link], and the 
corresponding multiwavelets were constructed in [link]. As shown in [link], they 
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Geronimo-Hardin-Massopust Multi-scaling Function and Strang-Strela 
Multiwavelets: @, 


are both symmetrical and orthogonal—a combination which is impossible for two-band orthogonal 
scalar wavelets. They also have short support, and can exactly reproduce the hat function. These 
interesting properties make multiwavelet a promising expansion system. 


Spline Multiwavelets 


Spline bases have a maximal approximation order with respect to their length, however spline 
uniwavelets are only semiorthogonal [link]. A family of spline multiwavelets that are symmetric and 


orthogonal is developed in [link]. 


Other Constructions 


Other types of multiwavelets are constructed using Hermite interpolating conditions [link], matrix 
spectral factorization [link], finite elements [link], and oblique projections [link]. Similar to 
multiwavelets, vector-valued wavelets and vector filter banks are also developed [link]. 


Applications 


Multiwavelets have been used in data compression [link], [link], [link], noise reduction [link], [link], 
and solution of integral equations [link]. Because multiwavelets are able to offer a combination of 
orthogonality, symmetry, higher order of approximation and short support, methods using 
multiwavelets frequently outperform those using the comparable scale wavelets. However, it is found 
that prefiltering is very important, and should be chosen carefully for the applications [link], [link], 
[link]. Also, since discrete multiwavelet transforms operate on size-R blocks of data and generate 
blocks of wavelet coefficients, the correlation within each block of coefficients needs to be exploited. 
For image compression, predictions rules are proposed to exploit the correlation in order to reduce the 
bit rate [link]. For noise reduction, joint thresholding coefficients within each block improve the 
performance [link]. 


Overcomplete Representations, Frames, Redundant Transforms, and Adaptive Bases 


In this chapter, we apply the ideas of frames and tight frames introduced in Chapter: Bases, 
Orthogonal Bases, Biorthogonal Bases, Frames, Right Frames, and unconditional Bases as well as 
bases to obtain a more efficient representation of many interesting signal classes. It might be helpful 
to review the material on bases and frames in that chapter while reading this section. 


Traditional basis systems such as Fourier, Gabor, wavelet, and wave packets are efficient 
representations for certain classes of signals, but there are many cases where a single system is not 
effective. For example, the Fourier basis is an efficient system for sinusoidal or smooth periodic 
signals, but poor for transient or chirp-like signals. Each system seems to be best for a rather well- 
defined but narrow class of signals. Recent research indicates that significant improvements in 
efficiency can be achieved by combining several basis systems. One can intuitively imagine 
removing Fourier components until the expansion coefficients quit dropping off rapidly, then 
switching to a different basis system to expand the residual and, after that expansion quits dropping 
off rapidly, switching to still another. Clearly, this is not a unique expansion because the order of 
expansion system used would give different results. This is because the total expansion system is a 
linear combination of the individual basis systems and is, therefore, not a basis itself but a frame. It is 
an overcomplete expansion system and a variety of criteria have been developed to use the freedom 
of the nonuniqueness of the expansion to advantage. The collection of basis systems from which a 
subset of expansion vectors is chosen is sometimes called a dictionary. 


There are at least two descriptions of the problem. We may want a single expansion system to handle 
several different classes of signals, each of which are well-represented by a particular basis system or 
we may have a single class of signals, but the elements of that class are linear combinations of 


members of the well-represented classes. In either case, there are several criteria that have been 
identified as important [link], [link]: 


¢ Sparsity: The expansion should have most of the important information in the smallest number 
of coefficients so that the others are small enough to be neglected or set equal to zero. This is 
important for compression and denoising. 

¢ Separation: If the measurement consists of a linear combination of signals with different 
characteristics, the expansion coefficients should clearly separate those signals. If a single signal 
has several features of interest, the expansion should clearly separate those features. This is 
important for filtering and detection. 

¢ Superresolution: The resolution of signals or characteristics of a signal should be much better 
than with a traditional basis system. This is likewise important for linear filtering, detection, and 
estimation. 

¢ Stability: The expansions in terms of our new overcomplete systems should not be significantly 
changed by perturbations or noise. This is important in implementation and data measurement. 

e Speed: The numerical calculation of the expansion coefficients in the new overcomplete system 
should be of order O(V) or O(N log (V)). 


These criteria are often in conflict with each other, and various compromises will be made in the 
algorithms and problem formulations for an acceptable balance. 


Overcomplete Representations 


This section uses the material in Chapter: Bases, Orthogonal Bases, Biorthogonal Bases, Frames, 
Right Frames, and unconditional Bases on bases and frames. One goal is to represent a signal using a 
“dictionary" of expansion functions that could include the Fourier basis, wavelet basis, Gabor basis, 
etc. We formulate a finite dimensional version of this problem as 


Equation: 
y(n) = Soaxar(n) nkeZ 
k 
forn = 0,1,2,---, N—1landk = 0,1,2,---,K —1. This can be written in matrix form as 
Equation: 


y = Xa 


where y isa N x 1 vector with elements being the signal values y(n), the matrix X is N x K the 
columns of which are made up of all the functions in the dictionary and ais a K x 1 vector of the 
expansion coefficients a. The matrix operator has the basis signals x, as its columns so that the 
matrix multiplication [link] is simply the signal expansion [link]. 


For a given signal representation problem, one has two decisions: what dictionary to use (i.e., choice 
of the X) and how to represent the signal in terms of this dictionary (i.e., choice of a). Since the 
dictionary is overcomplete, there are several possible choices of @ and typically one uses prior 
knowledge or one or more of the desired properties we saw earlier to calculate the a. 


A Matrix Example 


Consider a simple two-dimensional system with orthogonal basis vectors 
Equation: 
1 0 
xX, = A and x2= A 


which gives the matrix operator with x; and x2 as columns 


Equation: 
1 0 
i= ; 
0 4 


Decomposition with this rather trivial operator gives a time-domain description in that the first 
expansion coefficient ao is simply the first value of the signal, z(0), and the second coefficient is the 
second value of the signal. Using a different set of basis vectors might give the operator 

Equation: 


0.7071 0.7071 


X= |0.7071 —0.7071 


which has the normalized basis vectors still orthogonal but now at a 45° angle from the basis vectors 
in [link]. This decomposition is a sort of frequency domain expansion. The first column vector will 
simply be the constant signal, and its expansion coefficient a(0) will be the average of the signal. 
The coefficient of the second vector will calculate the difference in y(0) and y(1) and, therefore, be a 
measure of the change. 


1 
Notice that y = 4 can be represented exactly with only one nonzero coefficient using [link] but 


will require two with [link], while for y = 1 the opposite is true. This means the signals y = 0 


0 1 1 
and y = H can be represented sparsely by [link] while y = | and y = 1 can be represented 
sparsely by [link]. 


If we create an overcomplete expansion by a linear combination of the previous orthogonal basis 
systems, then it should be possible to have a sparse representation for all four of the previous signals. 
This is done by simply adding the columns of [link] to those of [link] to give 

Equation: 


— 1 0 0.7071 0.7071 
~ 10 1 0.7071 —0.7071 


This is clearly overcomplete, having four expansion vectors in a two-dimensional system. Finding a; 
requires solving a set of underdetermined equations, and the solution is not unique. 


For example, if the signal is given by 
Equation: 


fi 


there are an infinity of solutions, several of which are listed in the following table. 


Case 1 2 3 4 fs) 6 7 
ao 0.5000 1.0000 1.0000 1.0000 0 0 0 
QQ 0.0000 0.0000 0 0 -1.0000 1.0000 0 
a2 0.3536 0 0.0000 0 1.4142 0 0.7071 
a3 0.3536 0 0 0.0000 0 1.4142 0.7071 
|| ? 0.5000 1.0000 1.0000 1.0000 3.0000 3.0000 1.0000 


Case 1 is the minimum norm solution of y = X a for ag. It is calculated by a pseudo inverse with 
the Matlab command a = pinv(X)*y .Itis also the redundant DWT discussed in the next 
section and calculated by a = X' *y/2. Case 2 is the minimum norm solution, but for no more than 
two nonzero values of a. Case 2 can also be calculated by inverting the matrix [link] with columns 3 
and 4 deleted. Case 3 is calculated the same way with columns 2 and 4 deleted, case 4 has columns 2 
and 3 deleted, case 5 has 1 and 4 deleted, case 6 has 1 and 3 deleted, and case 7 has 1 and 2 deleted. 
Cases 3 through 7 are unique since the reduced matrix is square and nonsingular. The second term of 
a for case 1 is zero because the signal is orthogonal to that expansion vector. Notice that the norm of 
a is minimum for case 1 and is equal to the norm of y divided by the redundancy, here two. Also 
notice that the coefficients in cases 2, 3, and 4 are the same even though calculated by different 
methods. 


Because X is not only a frame, but a tight frame with a redundancy of two, the energy (norm 
squared) of a@ is one-half the norm squared of y. The other decompositions (not tight frame or basis) 
do not preserve the energy. 


Next consider a two-dimensional signal that cannot be exactly represented by only one expansion 
vector. If the unity norm signal is given by 
Equation: 

_ [0.9806 

~ {0.1961 


the expansion coefficients are listed next for the same cases described previously. 


Case 1 2 3 4 fs) 6 7 
Qo 0.4903 0.9806 0.7845 1.1767 0 0 0 
ay 0.0981 0.1961 0 0 -0.7845 1.1767 0 
Qe 0.4160 0 0.2774 0 1.3868 0 0.8321 
a3 0.2774 0 0 -0.2774 0 1.3868 0.5547 


\|a| |? 0.5000 1.0000 0.6923 1.4615 2.5385 3.3077 1.0000 


Again, case 1 is the minimum norm solution; however, it has no zero components this time because 
there are no expansion vectors orthogonal to the signal. Since the signal lies between the 90° and 45° 
expansion vectors, it is case 3 which has the least two-vector energy representation. 


There are an infinite variety of ways to construct the overcomplete frame matrix X. The one in this 
example is a four-vector tight frame. Each vector is 45° degrees apart from nearby vectors. Thus they 
are evenly distributed in the 180° upper plane of the two dimensional space. The lower plane is 
covered by the negative of these frame vectors. A three-vector tight frame would have three columns, 
each 60° from each other in the two-dimension plane. A 36-vector tight frame would have 36 
columns spaced 5° from each other. In that system, any signal vector would be very close to an 
expansion vector. 


Still another alternative would be to construct a frame (not tight) with nonorthogonal rows. This 
would result in columns that are not evenly spaced but might better describe some particular class of 
signals. Indeed, one can imagine constructing a frame operator with closely spaced expansion vectors 
in the regions where signals are most likely to occur or where they have the most energy. 


We next consider a particular modified tight frame constructed so as to give a shift-invariant DWT. 


Shift-Invariant Redundant Wavelet Transforms and Nondecimated Filter Banks 


One of the few flaws in the various wavelet basis decompositions and wavelet transforms is the fact 
the DWT is not translation-invariant. If you shift a signal, you would like the DWT coefficients to 
simply shift, but it does more than that. It significantly changes character. 


Imagine a DWT of a signal that is a wavelet itself. For example, if the signal were 
Equation: 


then the DWT would be 
Equation: 


d4(10)=1 all other d;(k) =c(k) =0. 


In other words, the series expansion in the orthogonal wavelet basis would have only one nonzero 
coefficient. 


If we shifted the signal to the right so that y(n) = y(24 (n — 1) — 10), there would be many 
nonzero coefficients because at this shift or translation, the signal is no longer orthogonal to most of 
the basis functions. The signal energy would be partitioned over many more coefficients and, 
therefore, because of Parseval's theorem, be smaller. This would degrade any denoising or 
compressions using thresholding schemes. The DWT described in Chapter: Calculation of the 
Discrete Wavelet Transform is periodic in that at each scale 7 the periodized DWT repeats itself after 
a shift of n = 2/, but the period depends on the scale. This can also be seen from the filter bank 
calculation of the DWT where each scale goes through a different number of decimators and 
therefore has a different aliasing. 


A method to create a linear, shift-invariant DWT is to construct a frame from the orthogonal DWT 
supplemented by shifted orthogonal DWTs using the ideas from the previous section. If you do this, 
the result is a frame and, because of the redundancy, is called the redundant DWT or RDWT. 


The typical wavelet based signal processing framework consists of the following three simple steps, 
1) wavelet transform; 2) point-by-point processing of the wavelet coefficients (e.g. thresholding for 
denoising, quantization for compression); 3) inverse wavelet transform. The diagram of the 
framework is shown in [link]. As mentioned before, the wavelet transform is not translation-invariant, 
so if we shift the signal, perform the above processing, and shift the output back, then the results are 
different for different shifts. Since the frame vectors of the RDWT consist of the shifted orthogonal 
DWT basis, if we replace the forward/inverse wavelet transform 


IDWT 


The Typical Wavelet Transform Based Signal Processing Framework (A 
denotes the pointwise processing) 


RDWT 


The Typical Redundant Wavelet Transform Based Signal Processing 
Framework (A denotes the pointwise processing) 


in the above framework by the forward/inverse RDWT, then the result of the scheme in [link] is the 
same as the average of all the processing results using DWTs with different shifts of the input data. 
This is one of the main reasons that RDWT-based signal processing tends to be more robust. 


Still another view of this new transform can be had by looking at the Mallat-derived filter bank 
described in Chapter: The Scaling Function and Scaling Coefficients, Wavelet and Wavelet 
Coefficients and Chapter: Filter Banks and Transmultiplexers . The DWT filter banks illustrated 

in Figure: Two-Stage Two-Band Analysis Tree and Figure: Two-Band Synthesis Bank can be 
modified by removing the decimators between each stage to give the coefficients of the tight frame 
expansion (the RDWT) of the signal. We call this structure the undecimated filter bank. Notice that, 
without the decimation, the number of terms in the DWT is larger than NV. However, since these are 
the expansion coefficients in our new overcomplete frame, that is consistent. Also, notice that this 
idea can be applied to M-band wavelets and wavelet packets in the same way. 


These RDWTs are not precisely a tight frame because each scale has a different redundancy. 
However, except for this factor, the RDWT and undecimated filter have the same characteristics of a 
tight frame and, they support a form of Parseval's theorem or energy partitioning. 


If we use this modified tight frame as a dictionary to choose a particular subset of expansion vectors 
as a new frame or basis, we can tailor the system to the signal or signal class. This is discussed in the 
next section on adaptive systems. 


This idea of RDWT was suggested by Mallat [link], Beylkin [link], Shensa [link], Dutilleux [link], 
Nason [link], Guo [link], [link], Coifman, and others. This redundancy comes at a price of the new 
RDWT having O(N log (V)) arithmetic complexity rather than O(V). Liang and Parks [link], 
{link], Bao and Erdol [link], [link], Marco and Weiss [link], [link], [link], Daubechies [link], and 
others [link] have used some form of averaging or “best basis" transform to obtain shift invariance. 


Recent results indicate this nondecimated DWT, together with thresholding, may be the best 
denoising strategy [link], [link], [link], [link], [link], [link], [link], [link]. The nondecimated DWT is 
shift invariant, is less affected by noise, quantization, and error, and has order N log (V) storage and 
arithmetic complexity. It combines with thresholding to give denoising and compression superior to 
the classical Donoho method for many examples. Further discussion of use of the RDWT can be 
found in Section: Nonlinear Filtering or Denoising with the DWT. 


Adaptive Construction of Frames and Bases 


In the case of the redundant discrete wavelet transform just described, an overcomplete expansion 
system was constructed in such a way as to be a tight frame. This allowed a single linear shift- 


invariant system to describe a very wide set of signals, however, the description was adapted to the 
characteristics of the signal. Recent research has been quite successful in constructing expansion 
systems adaptively so as to give high sparsity and superresolution but at a cost of added computation 
and being nonlinear. This section will look at some of the recent results in this area [link], [link], 
Link], [link]. 


While use of an adaptive paradigm results in a shift-invariant orthogonal transform, it is nonlinear. It 
has the property of DWT {a f(x)} = a DWT{f(x)}, but it does not satisfy superposition, i.e. 
DWT{ f(x) + g(z)} 4 DWT{f(x)} + DWT {g(x)}. That can sometimes be a problem. 


Since these finite dimensional overcomplete systems are a frame, a subset of the expansion vectors 
can be chosen to be a basis while keeping most of the desirable properties of the frame. This is 
described well by Chen and Donoho in [link], [link]. Several of these methods are outlined as 
follows: 


e The method of frames (MOF) was first described by Daubechies [link], [link], [link] and uses 
the rather straightforward idea of solving the overcomplete frame (underdetermined set of 
equations) in [link] by minimizing the L? norm of a. Indeed, this is one of the classical 
definitions of solving the normal equations or use of a pseudo-inverse. That can easily be done 
in Matlab bya = pinv(X) *y. This gives a frame solution, but it is usually not sparse. 

¢ The best orthogonal basis method (BOB) was proposed by Coifman and Wickerhauser [link], 
[link] to adaptively choose a best basis from a large collection. The method is fast (order 
N log N) but not necessarily sparse. 

¢ Mallat and Zhang [link] proposed a sequential selection scheme called matching pursuit (MP) 
which builds a basis, vector by vector. The efficiency of the algorithm depends on the order in 
which vectors are added. If poor choices are made early, it takes many terms to correct them. 
Typically this method also does not give sparse representations. 

¢ A method called basis pursuit (BP) was proposed by Chen and Donoho [link], [link] which 
solves [link] while minimizing the L+ norm of a. This is done by linear programming and 
results in a globally optimal solution. It is similar in philosophy to the MOFs but uses an L! 
norm rather than an L? norm and uses linear programming to obtain the optimization. Using 
interior point methods, it is reasonably efficient and usually gives a fairly sparse solution. 

e Krim et al. describe a best basis method in [link]. Tewfik et al. propose a method called optimal 
subset selection in [link] and others are [link], [link]. 


All of these methods are very signal and problem dependent and, in some cases, can give much better 
results than the standard M-band or wavelet packet based methods. 


Local Trigonometric Bases 


In the material up to this point, all of the expansion systems have required the translation and scaling 
properties of [link] and the satisfaction of the multiresolution analysis assumption of [link]. From this 
we have been able to generate orthogonal basis systems with the basis functions having compact 
support and, through generalization to M-band wavelets and wavelet packets, we have been able to 
allow a rather general tiling of the time-frequency or time-scale plane with flexible frequency 
resolution. 


By giving up the multiresolution analysis (MRA) requirement, we will be able to create another basis 
system with a time-frequency tiling somewhat the dual of the wavelet or wavelet packet system. 
Much as we saw the multiresolution system dividing the frequency bands in a logarithmic spacing for 


the M = 2 systems and a linear spacing for the higher M/ case, and a rather general form for the 
wavelet packets, we will now develop the local cosine and local sine basis systems for a more 
flexible time segmenting of the time-frequency plane. Rather than modifying the MRA systems by 
creating the time-varying wavelet systems, we will abandon the MRA and build a basis directly. 


What we are looking for is an expansion of a signal or function in the form 


Equation: 
-2, ay (N) Xk,n ( 


where the functions x; (¢) are of the form (for example) 
Equation: 


Xkn (t) = wz (t) cos (am (n+ B)t +7). 


Here wy, (t) is a window function giving localization to the basis function and a, 8 and ¥ are 
constants the choice of which we will get to shortly. & is a time index while n is a frequency index. 
By requiring orthogonality of these basis functions, the coefficients (the transform) are found by an 
inner product 

Equation: 


ax (n) = (F(t), xn (t y= fre ional 


We will now examine how this can be achieved and what the properties of the expansion are. 


Fundamentally, the wavelet packet system decomposes L? (IR) into a direct sum of orthogonal 
spaces, each typically covering a certain frequency band and spanned by the translates of a particular 
element of the wavelet packet system. With wavelet packets time-frequency tiling with flexible 
frequency resolution is possible. However, the temporal resolution is determined by the frequency 
band associated with a particular element in the packet. 


Local trigonometric bases [link], [link] are duals of wavelet packets in the sense that these bases give 
flexible temporal resolution. In this case, L? (IR) is decomposed into a direct sum of spaces each 
typically covering a particular time interval. The basis functions are all modulates of a fixed window 
function. 


One could argue that an obvious approach is to partition the time axis into disjoint bins and use a 
Fourier series expansion in each temporal bin. However, since the basis functions are “rectangular- 
windowed” exponentials they are discontinuous at the bin boundaries and hence undesirable in the 
analysis of smooth signals. If one replaces the rectangular window with a “smooth” window, then, 
since products of smooth functions are smooth, one can generate smooth windowed exponential basis 
functions. For example, if the time axis is split uniformly, one is looking at basis functions of the 
form {w (t — k)e”""'}, k,n € Z for some smooth window function w(t). Unfortunately, 
orthonormality disallows the function w(t) from being well-concentrated in time or in frequency - 
which is undesirable for time frequency analysis. More precisely, the Balian-Low theorem (see p.108 


in [link]) states that the Heisenberg product of g (the product of the time-spread and frequency-spread 
which is lower bounded by the Heisenberg uncertainty principle) is infinite. However, it turns out that 
windowed trigonometric bases (that use cosines and sines but not exponentials) can be orthonormal, 
and the window can have a finite Heisenberg product [link]. That is the reason why we are looking 
for local trigonometric bases of the form given in [link]. 


Nonsmooth Local Trigonometric Bases 


To construct local trigonometric bases we have to choose: (a) the window functions wz (t); and (b) 
the trigonometric functions (i.e., a, 8 and y in Eq. [link]). If we use the rectangular window (which 
we know is a bad choice), then it suffices to find a trigonometric basis for the interval that the 
window spans. Without loss of generality, we could consider the unit interval (0, 1) and hence we are 
interested in trigonometric bases for L? ((0, 1)). It is easy to see that the following four sets of 
functions satisfy this requirement. 


= {v2 sin (x (n + +)t)}, me{U, 12,0) 
= ee V2 cos (ma), n € {1,2,...}; 


Indeed, these orthonormal bases are obtained from the Fourier series on (—2, 2) (the first two) and on 
(—1, 1) (the last two) by appropriately imposing symmetries and hence are readily verified to be 
complete and orthonormal on (0, 1). If we choose a set of nonoverlapping rectangular window 
functions wy, (t) such that )>,, wz (t) = 1 for all ¢ € R, and define x; (t) = we (t)#, (t), then, 
{xn (t)} is a local trigonometric basis for L* (IR), for each of the four choices of phin (t) above. 


Construction of Smooth Windows 


We know how to construct orthonormal trigonometric bases for disjoint temporal bins or intervals. 
Now we need to construct smooth windows wz, (t) that when applied to cosines and sines retain 
orthonormality. An outline of the process is as follows: A unitary operation is applied that “unfolds” 
the discontinuities of all the local basis functions at the boundaries of each temporal bin. Unfolding 
leads to overlapping (unfolded) basis functions. However, since unfolding is unitary, the resulting 
functions still form an orthonormal basis. The unfolding operator is parameterized by a function r(t) 
that satisfies an algebraic constraint (which makes the operator unitary). The smoothness of the 
resulting basis functions depends on the smoothness of this underlying function r(t). 


The function r(t), referred to as a rising cutoff function, satisfies the following conditions (see [link]) 
Equation: 


0, ift<—1 


Ir(t)|? + |r(—t)|? =1, for allt CR; r= 4 ift>1 


r(t) is called a rising cutoff function because it rises from 0 to 1 in the interval [—1, 1] (note: it does 
not necessarily have to be monotone increasing). Multiplying a function by r(t) would localize it to 
[—1, oo]. Every real-valued function r(t) satisfying [link] is of the form r(t) = sin(O(t)) where 
Equation: 


0, ift<-1. 
a(t) +0(-t) = 2 for all t C R; ro={o ie 
2? eoook 


This ensures that r (—t) =sin (6 (—t)) =sin (4 — 0 (t)) =cos (6 (£)) and therefore 
r? (t) +r? (—t) = 1. One can easily construct arbitrarily smooth rising cutoff functions. We give 
one such recipe from [link] (p.105) . Start with a function 


Equation: 
0, ift<-1 
ro(t)= sin(Z(1t+t)), if -l1<t<1 
1, ift>1 


It is readily verified to be a rising cutoff function. Now recursively define rj) (t), rj (¢), --. as 
follows: 
Equation: 


Notice that 7 /,) (t) is a rising cutoff function for every n. Moreover, by induction on n it is easy to 
show that rj,j (t) € C 2"~1 (it suffices to show that derivatives at t = —1 and t = 1 exist and are zero 
up to order 2” — 1). 


Folding and Unfolding 


Using a rising cutoff function r(t) one can define the folding operator, U, and its inverse, the 
unfolding operator U™ as follows: 
Equation: 


Equation: 


Notice that U* (r)U (r)f (t) = (In(e)? Ae ir(-2)/”) f (t) =U (r)U* (r)f (t) and that 
U(r) f\| = \|f\| = ||U~ (vr) f|| showing that U(r) is a unitary operator on L? (IR). Also these 


operators change f(t) only in [—1, 1] since U (r)f (t) = f (t) =U™ (r)f (t) fort < —landt > 1. 
The interval |—1, 1] is the action region of the folding/unfolding operator. U(r) is called a folding 
operator acting at zero because for smooth f, U(r) f has a discontinuity at zero. By translation and 
dilation of r(t) one can define U(r, to, €) and U™ (r, to, €) that folds and unfolds respectively about 


t = to with action region|ty — €, to + €] and action radius e. 


Notice [link] and [link] do not define the value U(r) f(0) and U* (r) f (0) because of the 
discontinuity that is potentially introduced. An elementary exercise in calculus divulges the nature of 
this discontinuity. If f € C4 (IR), then U (r)f € C7(R \ {0}). Att = 0, left and right derivatives 
exist with all even-order left-derivatives and all odd order right-derivatives (upto and including d) 
being zero. Conversely, given any function f € C4 (R \ {0}) which has a discontinuity of the above 
type, U™ (r)f has a unique extension across t = 0 (i.e., a choice of value for (U™ (r) f) (0)) that is 
inc¢ (IR). One can switch the signs in [link] and [link] to obtain another set of folding and unfolding 
operators. In this case, for f € C4 (R), U(r) f will have its even-order right derivatives and odd- 
order left derivatives equal to zero. We will use U_, US and U_, U™, respectively to distinguish 
between the two types of folding/unfolding operators and call them positive and negative polarity 
folding/unfolding operators respectively. 


So far we have seen that the folding operator is associated with a rising cutoff function, acts at a 
certain point, has a certain action region and radius and has a certain polarity. To get a qualitative idea 
of what these operators do, let us look at some examples. 


First, consider a case where f(t) is even- or-odd symmetric about the folding point on the action 
interval. Then, U f corresponds to simply windowing f by an appropriate window function. Indeed, if 
f(t) = f(—t) on [-1, J], 

Equation: 

(r(t) + r(—t)) f(t), if t > 0, 

(r(—t)—r(t)) f(t), if t <0, 


and if f(t) = —f(—t) on [-1, 1] 
Equation: 


(r(t) —r(—-t)) f(t), if t > 0, 
(r(—t) —r(t))f(@), if t <0. 


[link] shows a rising cutoff function and the action of the folding operators of both polarities on the 
constant function. Observe the nature of the discontinuity at t = 0 and the effect of polarityreversal. 
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The Rising Cutoff Function r (t) = rjgj (t) 


We saw that for signals with symmetry in the action region folding, corresponds to windowing. Next 
we look at signals that are supported to the right (or left) of the folding point and see what unfolding 


does to them. In this case, U ii (r) (f) is obtained by windowing the even (or odd) extension of f(t) 
about the folding point. Indeed if f(t) = 0,t <0 
Equation: 
r(t)f(t), if t > 0, 
US (r,0,1)f (t) = 
AOE or if t <0, 


and if f(t) =0,¢ > 0, 
Equation: 


: _ f--r(-t)f(-t), if t>0, 
OU, (r,0,1)f (t) = ea if t <0, 


[link] shows the effect of the positive unfolding operator acting on cosine and sine functions 
supported on the right and left half-lines respectively. Observe that unfolding removes the 
discontinuities at ¢ = 0. If the polarity is reversed, the effects on signals on the half-line are switched; 
the right half-line is associated with windowed odd extensions and left half-line with windowed even 
extensions. 
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Folding Functions Supported on Half-Lines: 
(a) f (t) =cos ($11t)u (t) ((u(t) is i Unit Step or Heaviside Function) (b) 
U,, (73,0, 1) f(t) 
(c) f (t) a (§ ‘ihe 
(d) Us. (rig), 0,1) f (t) 


Local Cosine and Sine Bases 


Recall the four orthonormal trigonometric bases for L? ((0, 1)) we described earlier. 


1. {, (t)} = {V2 cos (w (n+ $)e) },n € {0,1,2,...} 
= {v2 sin (n(n + Z)t) },n€ {0,1,2,...55 
= {1 2 cos (ma), n€ {1,2,...3; 
= {v2 sin (ant), n € {0,1,2,...} 


The bases functions have discontinuities at ¢ = 0 and t = 1 because they are restrictions of the 
cosines and sines to the unit interval by rectangular windowing. The natural extensions of these basis 
functions to t € R (i.e., unwindowed cosines and sines) are either even (say “+”) or odd (say “-”) 
symmetric (locally) about the endpoints ¢ = 0 and t = 1. Indeed the basis functions for the four cases 
are (+, —), (—, +), (+, +) and (—, —) symmetric, respectively, at (0, 1). From the preceding 
analysis, this means that unfolding these basis functions corresponds to windowing if the unfolding 
operator has the right polarity. Also observe that the basis functions are discontinuous at the 
endpoints. Moreover, depending on the symmetry at each endpoint all odd derivatives (for “+” 
symmetry) or even derivatives (for “—” symmetry) are zero. By choosing unfolding operators of 
appropriate polarity at the endpoints (with non overlapping action regions) for the four bases, we get 
smooth basis functions of compact support. For example, for (+,—) symmetry, the basis function 

U, (70,0, €0)U+ (71, 1, €1) Hn (€) is supported in (—e€o, 1 + €1) and is as many times continuously 
differentiable as rg and rj are. 


Let {t;} be an ordered set of points in R defining a partition into disjoint intervals I; = [t;, t;+1]. 
Now choose one of the four bases above for each interval such that at ¢; the basis functions for ;_; 
and that for J; have opposite symmetries. We say the polarity at ¢; is positive if the symmetry is 
—)(+ and negative if it is +)(—. At each t; choose a smooth cutoff function r; (t) and action radius 
€; so that the action intervals do not overlap. Let p(j) be the polarity of ¢; and define the unitary 
operator 

Equation: 


Oe = Il Or) ate €;). 
j 


Let {¢,, (¢)} denote all the basis functions for all the intervals put together. Then {w,, (t)} forms a 
nonsmooth orthonormal basis for L? (R). Simultaneously {U™ ¢, (t) } also forms a smooth 


orthonormal basis for L? (IR). To find the expansion coefficients of a function f(t) in this basis we 


use 
Equation: 


(FU Yn) = UF Yn). 


In other words, to compute the expansion coefficients of f in the new (smooth) basis, one merely 
folds f to Uf and finds its expansion coefficients with respect to the original basis. This allows one 
to exploit fast algorithms available for coefficient computation in the original basis. 


So for an arbitrary choice of polarities at the end points ¢; we have smooth local trigonometric bases. 
In particular by choosing the polarity to be positive for all ¢; (consistent with the choice of the first 
basis in all intervals) we get local cosine bases. If the polarity is negative for all ¢; (consistent with 
the choice of the second basis for all intervals), we get local sine bases. Alternating choice of polarity 
(consistent with the alternating choice of the third and fourth bases in the intervals) thus leads to 
alternating cosine/sine bases. 
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Trigonometric basis functions - before and after unfolding: 
(a) f (t) =cos (Z(n+.5)t)u (t)u (4 — a = 10 
(b) Ux (rig, 0, 1) U4 (755), 
(c) f (t) =sin (F(n+. 5)t)u (t)u (4 — 
(d) U_ (73 ,0, 1)U — (rj |> 4, 


an n= 10 


) 
4,UF( 
) 
1) f (t) 


+ (n)t)u (t)u (4 — t) where n = 10 
3}, 0, 1)U_ (713],4 1) f (t) 

7 (n) )u (t)u (4 — t) where n = 10 
(h) U_ (773), 0, 1) U4 (7), 4, 1) F (4) 


All these bases can be constructed in discrete time by sampling the cosines/sines basis functions 
[link]. Local cosine bases in discrete time were constructed originally by Malvar and are sometimes 
called lapped orthogonal transforms [link]. In the discrete case, the efficient implementation of 
trigonometric transforms (using DCT I-IV and DST I-IV) can be utilized after folding. In this case, 
expanding in local trigonometric bases corresponds to computing a DCT after preprocesing (or 
folding) the signal. 


For a sample basis function in each of the four bases, [link] shows the corresponding smooth basis 
function after unfolding. Observe that for local cosine and sine bases, the basis functions are not 
linear phase; while the window is symmetric, the windowed functions are not. However, for 
alternating sine/cosine bases the (unfolded) basis functions are linear phase. There is a link between 
local sine (or cosine) bases and modulated filter banks that cannot have linear phase filters (discussed 
in Chapter: Filter Banks and Transmultiplexers_). So there is also a link between alternating 
cosine/sine bases and linear-phase modulated filter banks (again see Chapter: Filter Banks and 
Transmultiplexers ). This connection is further explored in [link]. 


Local trigonometric bases have been applied to several signal processing problems. For example, 
they have been used in adaptive spectral analysis and in the segmentation of speech into voiced and 
unvoiced regions [link]. They are also used for image compression and are known in the literature as 
lapped-orthogonal transforms [link]. 


Signal Adaptive Local Trigonometric Bases 


In the adaptive wavelet packet analysis described in [link], we considered a full filter bank tree of 
decompositions and used some algorithm (best-basis algorithm, for instance) to prune the tree to get 
the best tree topology (equivalently frequency partition) for a given signal. The idea here is similar. 
We partition the time axis (or interval) into bins and successively refine each partition into further 
bins, giving a tree of partitions for the time axis (or interval). If we use smooth local trigonometric 
bases at each of the leaves of a full or pruned tree, we get a smooth basis for all signals on the time 
axis (or interval). In adaptive local bases one grows a full tree and prunes it based on some criterion 
to get the optimal set of temporal bins. 


[link] schematically shows a sample time-frequency tiling associated with a particular local 
trigonometric basis. Observe that this is the dual of a wavelet packet tiling (see [link])—in the sense 
that one can switch the time and frequency axes to go between the two. 


Discrete Multiresolution Analysis, the Discrete-Time Wavelet Transform, and the 
Continuous Wavelet Transform 


Up to this point, we have developed wavelet methods using the series wavelet expansion of 
continuous-time signals called the discrete wavelet transform (DWT), even though it probably should 
be called the continuous-time wavelet series. This wavelet expansion is analogous to the 


Local Basis 


Fourier series in that both are series expansions that transform continuous-time signals into a discrete 
sequence of coefficients. However, unlike the Fourier series, the DWT can be made periodic or 
nonperiodic and, therefore, is more versatile and practically useful. 


In this chapter we will develop a wavelet method for expanding discrete-time signals in a series 
expansion since, in most practical situations, the signals are already in the form of discrete samples. 
Indeed, we have already discussed when it is possible to use samples of the signal as scaling function 
expansion coefficients in order to use the filter bank implementation of Mallat's algorithm. We find 
there is an intimate connection between the DWT and DIWT, much as there is between the Fourier 
series and the DFT. One expands signals with the FS but often implements that with the DFT. 


To further generalize the DWT, we will also briefly present the continuous wavelet transform which, 
similar to the Fourier transform, transforms a function of continuous time to a representation with 
continuous scale and translation. In order to develop the characteristics of these various wavelet 
representations, we will often call on analogies with corresponding Fourier representations. However, 
it is important to understand the differences between Fourier and wavelet methods. Much of that 
difference is connected to the wavelet being concentrated in both time and scale or frequency, to the 
periodic nature of the Fourier basis, and to the choice of wavelet bases. 


Discrete Multiresolution Analysis and the Discrete-Time Wavelet Transform 


Parallel to the developments in early chapters on multiresolution analysis, we can define a discrete 
multiresolution analysis (DMRA) for J2, where the basis functions are discrete sequences [link], 
{link], [link]. The expansion of a discrete-time signal in terms of discrete-time basis function is 
expressed in a form parallel to [link] as 

Equation: 


where w(m) is the basic expansion function of an integer variable m. If these expansion functions are 
an orthogonal basis (or form a tight frame), the expansion coefficients (discrete-time wavelet 
transform) are found from an inner product by 

Equation: 


dj (k) = (f (n),b (2’n—k)) = So f(r) 4 (2’n—h) 


If the expansion functions are not orthogonal or even independent but do span £”, a biorthogonal 
system or a frame can be formed such that a transform and inverse can be defined. 


Because there is no underlying continuous-time scaling function or wavelet, many of the questions, 
properties, and characteristics of the analysis using the DWT in Chapter: Introduction to Wavelets, 
Wavelet System Design , etc. do not arise. In fact, because of the filter bank structure for calculating 
the DTWT, the design is often done using multirate frequency domain techniques, e.g., the work by 
Smith and Barnwell and associates [link]. The questions of zero wavelet moments posed by 
Daubechies, which are related to ideas of convergence for iterations of filter banks, and Coifman's 
zero scaling function moments that were shown to help approximate inner products by samples, seem 
to have no DTWT interpretation. 


The connections between the DIWT and DWT are: 


e Ifthe starting sequences are the scaling coefficients for the continuous multiresolution analysis 
at very fine scale, then the discrete multiresolution analysis generates the same coefficients as 
does the continuous multiresolution analysis on dyadic rationals. 

e When the number of scales is large, the basis sequences of the discrete multiresolution analysis 
converge in shape to the basis functions of the continuous multiresolution analysis. 


The DTWT or DMRA is often described by a matrix operator. This is especially easy if the transform 
is made periodic, much as the Fourier series or DFT are. For the discrete time wavelet transform 
(DTWT), a matrix operator can give the relationship between a vector of inputs to give a vector of 
outputs. Several references on this approach are in [link], [link], [link], [link], [link], [link], [link], 
[link]. 


Continuous Wavelet Transforms 


The natural extension of the redundant DWT in [link] is to the continuous wavelet transform (CWT), 
which transforms a continuous-time signal into a wavelet transform that is a function of continuous 
shift or translation and a continuous scale. This transform is analogous to the Fourier transform, 
which is redundant, and results in a transform that is easier to interpret, is shift invariant, and is 
valuable for time-frequency/scale analysis. [link], [link], [link], [link], [link], [link], [link] 


The definition of the CWT in terms of the wavelet w(t) is given by 


Equation: 
F(s,t) = o¥ [ su( =) a 


s 


where the inverse transform is 


Equation: 
f(t) = Kf [pr torw(—) asar 


with the normalizing constant given by 
Equation: 


_ pW, 
K= | or 


with W (a) being the Fourier transform of the wavelet w(t). In order for the wavelet to be admissible 
(for [link] to hold), K < oo. In most cases, this simply requires that W(0) = 0 and that W(«@) go to 
zero (W (co) = 0) fast enough that K < oo. 


These admissibility conditions are satisfied by a very large set of functions and give very little insight 
into what basic wavelet functions should be used. In most cases, the wavelet w(t) is chosen to give as 
good localization of the energy in both time and scale as possible for the class of signals of interest. It 
is also important to be able to calculate samples of the CWT as efficiently as possible, usually 
through the DWT and Mallat's filter banks or FFTs. This, and the interpretation of the CWT, is 
discussed in [link], [link], [link], [Link], [link], [link], [link], [link], [link]. 


The use of the CWT is part of a more general time-frequency analysis that may or may not use 
wavelets [link], [link], [link], [Link], [link]. 


Analogies between Fourier Systems and Wavelet Systems 


In order to better understand the wavelet transforms and expansions, we will look at the various 
forms of Fourier transforms and expansion. If we denote continuous time by CT, discrete time by DT, 
continuous frequency by CF, and discrete frequency by DF, the following table will show what the 
discrete Fourier transform (DFT), Fourier series (FS), discrete-time Fourier transform (DTFT), and 
Fourier transform take as time domain signals and produce as frequency domain transforms or series. 


For example, the Fourier series takes a continuous-time input signal and produces a sequence of 
discrete-frequency coefficients while the DTFT takes a discrete-time sequence of numbers as an input 
signal and produces a transform that is a function of continuous frequency. 


DT CT 
DF DFT FS 
CF DTFT FT 


Continuous and Discrete Input and Output for Four Fourier Transforms 


Because the basis functions of all four Fourier transforms are periodic, the transform of a periodic 
signal (CT or DT) is a function of discrete frequency. In other words, it is a sequence of series 
expansion coefficients. If the signal is infinitely long and not periodic, the transform is a function of 
continuous frequency and the inverse is an integral, not a sum. 

Equation: 


Periodic in time = Discrete in frequency 
Equation: 


Periodic in frequency } Discrete in time 


A bit of thought and, perhaps, referring to appropriate materials on signal processing and Fourier 
methods will make this clear and show why so many properties of Fourier analysis are created by the 
periodic basis functions. 


Also recall that in most cases, it is the Fourier transform, discrete-time Fourier transform, or Fourier 
series that is needed but it is the DFT that can be calculated by a digital computer and that is probably 
using the FFT algorithm. If the coefficients of a Fourier series drop off fast enough or, even better, are 
zero after some harmonic, the DFT of samples of the signal will give the Fourier series coefficients. If 
a discrete-time signal has a finite nonzero duration, the DFT of its values will be samples of its 
DTFT. From this, one sees the relation of samples of a signal to the signal and the relation of the 
various Fourier transforms. 


Now, what is the case for the various wavelet transforms? Well, it is both similar and different. The 
table that relates the continuous and discrete variables is given by where DW indicates discrete values 
for scale and translation given by 7 and k, with CW denoting continuous values for scale and 
translation. 


DT CT 
DW DTWT DWT 
CW DTCWT CWT 


Continuous and Discrete Input and Output for Four Wavelet Transforms 


We have spent most this book developing the DWT, which is a series expansion of a continuous time 
signal. Because the wavelet basis functions are concentrated in time and not periodic, both the DTWT 
and DWT will represent infinitely long signals. In most practical cases, they are made periodic to 
facilitate efficient computation. Chapter: Calculation of the Discrete Wavelet Transform gives the 
details of how the transform is made periodic. The discrete-time, continuous wavelet transform 
(DTCWT) is seldom used and not discussed here. 


The naming of the various transforms has not been consistent in the literature and this is complicated 
by the wavelet transforms having two transform variables, scale and translation. If we could rename 
all the transforms, it would be more consistent to use Fourier series (FS) or wavelet series (WS) for a 
series expansion that produced discrete expansion coefficients, Fourier transforms (FT) or wavelet 
transforms (WT) for integral expansions that produce functions of continuous frequency or scale or 
translation variable together with DT (discrete time) or CT (continuous time) to describe the input 
signal. However, in common usage, only the DTFT follows this format! 


Common Consistent Time, Transform Input Output 
name name CorD CorD periodic periodic 
FS CTFS C D Yes No 
DFT DTFS D D Yes Yes 
DTFT DTFT D C No Yes 

FT CTFT C C No No 
DWT CTWS C D YorN YorN 
DTWT DTWS D D Yor N YorN 
- DTWT D C N N 
CWT CTWT C C N N 


Continuous and Discrete, Periodic and Nonperiodic Input and Output for Transforms 


Recall that the difference between the DWT and DTWT is that the input to the DWT is a sequence of 
expansion coefficients or a sequence of inner products while the input to the DTWT is the signal 
itself, probably samples of a continuous-time signal. The Mallat algorithm or filter bank structure is 
exactly the same. The approximation is made better by zero moments of the scaling function (see 
Section: Approximation of Scaling Coefficients by Samples of the Signal) or by some sort of 
prefiltering of the samples to make them closer to the inner products [link]. 


As mentioned before, both the DWT and DTWT can be formulated as nonperiodic, on-going 
transforms for an exact expansion of infinite duration signals or they may be made periodic to handle 
finite-length or periodic signals. If they are made periodic (as in Chapter: Calculation of the Discrete 
Wavelet Transform ), then there is an aliasing that takes place in the transform. Indeed, the aliasing 
has a different period at the different scales which may make interpretation difficult. This does not 
harm the inverse transform which uses the wavelet information to “unalias" the scaling function 
coefficients. Most (but not all) DWT, DT WT, and matrix operators use a periodized form [link]. 


Filter Banks and Transmultiplexers 


Introduction 


In this chapter, we develop the properties of wavelet systems in terms of the underlying filter banks 
associated with them. This is an expansion and elaboration of the material in Chapter: Filter Banks and 
the Discrete Wavelet Transform, where many of the conditions and properties developed from a signal 
expansion point of view in Chapter: The Scaling Function and Scaling Coefficients, Wavelet and 
Wavelet Coefficients are now derived from the associated filter bank. The Mallat algorithm uses a special 
structure of filters and downsamplers/upsamplers to calculate and invert the discrete wavelet transform. 
Such filter structures have been studied for over three decades in digital signal processing in the context 
of the filter bank and transmultiplexer problems [link], [link], [link], [link], [link], [Link], [link], [link], 
[link]. Filter bank theory, besides providing efficient computational schemes for wavelet analysis, also 
gives valuable insights into the construction of wavelet bases. Indeed, some of the finer aspects of 
wavelet theory emanates from filter bank theory. 


The Filter Bank 


A filter bank is a structure that decomposes a signal into a collection of subsignals. Depending on the 
application, these subsignals help emphasize specific aspects of the original signal or may be easier to 
work with than the original signal. We have linear or non-linear filter banks depending on whether or not 
the subsignals depend linearly on the original signal. Filter banks were originally studied in the context of 
signal compression where the subsignals were used to “represent” the original signal. The subsignals 
(called subband signals) are downsampled so that the data rates are the same in the subbands as in the 
original signal—though this is not essential. Key points to remember are that the subsignals convey 
salient features of the original signal and are sufficient to reconstruct the original signal. 


[link] shows a linear filter bank that is used in signal compression (subband coding). The analysis filters 
{h;} are used to filter the input signal x(n). The filtered signals are downsampled to give the subband 
signals. Reconstruction of the original signal is achieved by upsampling, filtering and adding up the 
subband signals as shown in the right-hand part of [link]. The desire for perfect reconstruction (i.e., 

y(n) = x(n)) imposes a set of bilinear constraints (since all operations in [link] are linear) on the 
analysis and synthesis filters. This also constrains the downsampling factor, M, to be at most the number 
of subband signals, say L. Filter bank design involves choosing filters {h;} and {g;} that satisfy perfect 
reconstruction and simultaneously give informative and useful subband signals. In subband speech 
coding, for example, a natural choice of desired frequency responses—motivated by the nonuniform 
sensitivity of the human ear to various frequency bands—for the analysis and synthesis filters is shown in 
[link]. 


L-Band Filter Bank with Rate-Change Factor of M 


In summary, the filter bank problem involves the design of the filters h; (n) and g; (n), with the 
following goals: 


1. Perfect Reconstruction (i.e., y(n) = x(n)). 

2. Usefulness. Clearly this depends on the application. For the subband coding application, the filter 
frequency responses might approximate the ideal responses in [link]. In other applications the filters 
may have to satisfy other constraints or approximate other frequency responses. 


If the signals and filters are multidimensional in [link], we have the multidimensional filter bank design 
problem. 


Ideal Frequency Responses in an L-band Filter Bank 


Transmultiplexer 


A transmultiplexer is a structure that combines a collection of signals into a single signal at a higher rate; 
Le., it is the dual of a filter bank. If the combined signal depends linearly on the constituent signal, we 
have a linear transmultiplexer. Transmultiplexers were originally studied in the context of converting 
time-domain-multiplexed (TDM) signals into frequency domain multiplexed (FDM) signals with the goal 
of converting back to time-domain-multiplexed signals at some later point. A key point to remember is 
that the constituent signals should be recoverable from the combined signal. [link] shows the structure of 
a transmultiplexer. The input signals y; (7) were upsampled, filtered, and combined (by a synthesis bank 
of filters) to give a composite signal d(n). The signal d(n) can be filtered (by an analysis bank of filters) 
and downsampled to give a set of signals x; (n). The goal in transmultiplexer design is a choice of filters 
that ensures perfect reconstruction (i.e., for all 7, x; (n) = y; (n)). This imposes bilinear constraints on 
the synthesis and analysis filters. Also, the upsampling factor must be at least the number of constituent 
input signals, say L. Moreover, in classical TDM-FDM conversion the analysis and synthesis filters must 
approximate the ideal frequency responses in [link]. If the input signals, analysis filters and synthesis 
filters are multidimensional, we have a multidimensional transmultiplexer. 


Perfect Reconstruction—A Closer Look 


We now take a closer look at the set of bilinear constraints on the analysis and synthesis filters of a filter 
bank and/or transmultiplexer that ensures perfect reconstruction (PR). Assume that there are L analysis 
filters and L synthesis filters and that downsampling/upsampling is by some integer MZ. These 
constraints, broadly speaking, can be viewed in three useful ways, each applicable in specific situations. 


1. Direct characterization - which is useful in wavelet theory (to characterize orthonormality and frame 
properties), in the study of a powerful class of filter banks (modulated filter banks), etc. 

2. Matrix characterization - which is useful in the study of time-varying filter banks. 

3. z-transform-domain (or polyphase-representation) characterization - which is useful in the design 
and implementation of (unitary) filter banks and wavelets. 


Direct Characterization of PR 


We will first consider the direct characterization of PR, which, for both filter banks and transmultiplexers, 
follows from an elementary superposition argument. 


Theorem 38 A filter bank is PR if and only if, for all integers n, and no, 
Equation: 


L-1 


> S- h; (Mn + n1)g; (—Mn — ng) = 6 (n, — ng). 


i=0 


A transmultiplexer is PR if and only if, for alli, 7 € {0,1,..., Z — 1}, 
Equation: 


do hi (1) 95 (—M1 — n) = 4 (I)5 (i — 5). 
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L-Band Transmultiplexer with Rate Change Factor of 7 


Moreover, if the number of channels is equal to the downsampling factor (i.e., L = |M)|),[link] and [link] 
are equivalent. 


Consider a PR filter bank. Since an arbitrary signal is a linear superposition of impulses, it suffices to 
consider the input signal, z (n) = 6(n — ees for arbitrary integer n;. Then (see [link]) 
d; (n) = h; (Mn — nz) and therefore, y (n2) =D di 9( gi (nz — Mn)d; (n). But by PR, 


y (n2) = 46 (nz — nz). The filter bank PR property is srecisely a statement of this fact: 
Equation: 


y (nz) = dds (nz — Mn)d; (n) = dds (nz — Mn)h; (Mn — 1) = 46 (ng — 11). 


Consider a PR transmultiplexer. Once again because of linear superposition, it suffices to cosnsider only 
the input signals x; (n) = 6(n)é (i — J) for all a and j. Then, d(n) = g; (m) (see [link]), and 
= py h; (n)d (MI — n). But by PR y; (1) = 6 (1)d (4 — 7). The transmultiplexer PR property is 


precisely a statement of this fact: 
Equation: 


yi (l) = Shy (n)d (M1 — n) = S~ hy (n)g; (MI — n) = 5(1)5 (i — 5). 


Remark: Strictly speaking, in the superposition argument proving [link], one has to consider the input 
signals x; (n) = 6 (nm — n1)6 (4 — 9) for arbitrary n;. One readily verifies that for all n [link] has to be 
satisfied. 


The equivalence of [link] and [link] when Z = M is not obvious from the direct characterization. 
However, the transform domain characterization that we shall see shortly will make this connection 
obvious. For a PR filter, bank the LZ channels should contain sufficient information to reconstruct the 
original signal (note the summation over 2 in [link]), while for a transmultiplexer, the constituent channels 
should satisfy biorthogonality constraints so that they can be reconstructed from the composite signal 
(note the biorthogonality conditions suggested by [link]). 


Matrix characterization of PR 


The second viewpoint is linear-algebraic in that it considers all signals as vectors and all filtering 
operations as matrix-vector multiplications [link]. In [link] and [link] the signals x(n), d; (n) and y(n) 
can be naturally associated with infinite vectors x, d; and y respectively. For example, 

x = |---,a(—1),x(0), x(1), ---]. Then the analysis filtering operation can be expressed as 
Equation: 


d; = Hix, fori ¢€ {0, 1,2,...,L—- 1}, 


where, for each 2, H; is a matrix with entries appropriately drawn from filter h;. H; is a block Toeplitz 
matrix (since its obtained by retaining every M*" row of the Toeplitz matrix representing convolution by 
h,) with every row containing h, in an index-reversed order. Then the synthesis filtering operation can be 
expressed as 

Equation: 


y=) Gd; 


where, for each 2, G; is a matrix with entries appropriately drawn from filter g;. G; is also a block 
Toeplitz matrix (since it is obtained by retaining every M‘" row of the Toeplitz matrix whose transpose 
represents convolution by g;) with every row containing g; in its natural order. Define d to be the vector 
obtained by interlacing the entries of each of the vectors d;: 

d = |---,do (0), di (0), ---, daz_1 (0), do (1), di (1), -- -]. Also define the matrices H and G (in terms 
of H; and G,) so that 

Equation: 


d=Hx, and y=G'd. 
H. is obtained by interlacing the rows of H; and G is obtained by interlacing the rows of G;. For 


example, in the FIR case if the filters are all of length N, 
Equation: 


ho(N—1) .. fo(N-M-—1) .. fo(0) 0 


(six = hia(N-1) .. hra(N-M-1) ... hr-1(0) 0 ia 
0 0 ho (N —1) 


From this development, we have the following result: 


Theorem 39 A filter bank is PR iff 


Equation: 

G'H =I. 
A transmultiplexer is PR iff 
Equation: 

HG! =I. 


Moreover, when L = M, both conditions are equivalent. 


One can also write the PR conditions for filter banks and transmultiplexers in the following form, which 
explicitly shows the formal relationship between the direct and matrix characterizations. For a PR filter 
bank we have 

Equation: 


S> G?H; =1. 


Correspondingly for a PR transmultiplexer we have 
Equation: 


ana A : 
HG; = 6(i— gL. 


Polyphase (Transform-Domain) Characterization of PR 


We finally look at the analysis and synthesis filter banks from a polyphase representation viewpoint. Here 
subsequences of the input and output signals and the filters are represented in the z-transform domain. 
Indeed let the z-transforms of the signals and filters be expressed in terms of the z-transforms of their 
subsequences as follows: 


Equation: 


Equation: 


Equation: 


Equation: 


M-1 
XZ) = 2*X;, ey) 
k=0 
M-1 
VigZ)= 2°YV;, (2) 
k= 
M-1 
Hj (z) = aie a a Ga) 
k=0 
M-1 
Gi (z) = z*Gip (Ze) 
k=0 


Then, along each branch of the analysis bank we have 


Equation: 


D; (z) 


Similarly, from the synthesis bank, we have 


Equation: 


Y(z) 


and therefore (from [link]) 
Equation: 


L-1 


Yu (z) = © Gig (z)Di (2). 
i=0 


Fori € {0,1,...,2 —1}andk © {0,1,..., M — 1}, define the polyphase component matrices 

(Ap (2); , = Hix (2) and (Gp (z)),,, = Gis (2). Let Xp (z) and Y, (z) denote the z-transforms of the 
polyphase signals x, (m) and y, (7), and let D, (z) be the vector whose components are D; (z). 
Equations [link] and [link] can be written compactly as 


Equation: 

Dp (2) = Hp (2)Xp (2); 
Equation: 

Yp (z) = Gp (z)Dp (2); 
and 
Equation: 


Y, (z) = GF (2)H, (Xp (2). 


Thus, the analysis filter bank is represented by the multi-input (the polyphase components of X(z)), 
multi-output (the signals D; (z)) linear-shift-invariant system H, (z) that takes in X, (z) and gives out 
D, (z). Similarly, the synthesis filter bank can be interpreted as a multi-input (the signals D; (z)), multi- 
output (the polyphase components of Y(z)) system G? (z), which maps D, (z) to Y; (z). Clearly we 
have PR iff Yp (z) = Xp (z). This occurs precisely when G} (z) Hy (z) = I. 


For the transmultiplexer problem, let Y, (z) and X, (z) be vectorized versions of the input and output 
signals respectively and let D, (z) be the generalized polyphase representation of the signal D(z). Now 
Dy (z) = G5 (2)Y, (z) and X, (z) = Hp (z)D, (z). Hence Xp (z) = Hy (z)G5 (2)Y, (2), and for PR 
H, (z)G3 (z) =I. 


Theorem 40 A filter bank has the PR property if and only if 
Equation: 


- 
G, (2) Hp (z) =I. 
A transmultiplexer has the PR property if and only if 
Equation: 


Ai, (2)G> (zj=t 


where H,, (z) and G» (z) are as defined above. 


Remark: If G? (z)H, (z) = J, then H, (z) must have at least as many rows as columns (i.e., L > M is 
necessary for a filter bank to be PR). If Hp (z)G} (z) = I then Hy (z) must have at least as many 


columns as rows (i.e., MM > L is necessary for a tranmultiplexer to be PR). If L = M, 
GP (z) Hp (z) = I = Hf (z)G» (2) and hence a filter bank is PR iff the corresponding transmultiplexer 


is PR. This equivalence is trivial with the polyphase representation, while it is not in the direct and matrix 
representations. 


Notice that the PR property of a filter bank or transmultiplexer is unchanged if all the analysis and 
synthesis filters are shifted by the same amount in opposite directions. Also any one analysis/synthesis 
filter pair can be shifted by multiples of M in opposite directions without affecting the PR property. 
Using these two properties (and assuming all the filters are FIR), we can assume without loss of 
generality that the analysis filters are supported in [0, N — 1] (for some integer NV). This also implies that 
H, (z) is a polynomial in z_', a fact that we will use in the parameterization of an important class of 
filter banks—unitary filter banks. 


All through the discussion of the PR property of filter banks, we have deliberately not said anything about 
the length of the filters. The bilinear PR constraints are completely independent of filter lengths and hold 
for arbitrary sequences. However, if the sequences are infinite then one requires that the infinite 
summations over 7 in [link] and [link] converge. Clearly, assuming that these filter sequences are in 

£? (Z) is sufficient to ensure this since inner products are then well-defined. 


Unitary Filter Banks 


From [link] it follows that a filter bank can be ensured to be PR if the analysis filters are chosen such that 
H is left-unitary, i.e., H’H = IL. In this case, the synthesis matrix G = H (from [link]) and therefore 
G,; = H, for all 2. Recall that the rows of G; contain g; in natural order while the rows of H; contains 
h, in index-reversed order. Therefore, for such a filter bank, since G; = Hy, the synthesis filters are 
reflections of the analysis filters about the origin; i.e., g; (n) = h; (—n). Filter banks where the analysis 
and synthesis filters satisfy this reflection relationship are called unitary (or orthogonal) filter banks for 
the simple reason that H1 is left-unitary. In a similar fashion, it is easy to see that if Hl is right-unitary (i.e., 
HH? = 1), then the transmultiplexer associated with this set of analysis filters is PR with 

gi (n) = h; (—n). This defines unitary transmultiplexers. 


We now examine how the three ways of viewing PR filter banks and transmultiplexers simplify when we 
focus on unitary ones. Since g; (n) = h; (—n), the direct characterization becomes the following: 


Theorem 41 A filter bank is unitary iff 
Equation: 


SONA; (Mn + n1)h; (Mn + nz) = b(n, — ng). 


u 


A transmultiplexer is unitary iff 
Equation: 


S 7h; (n)h; (M1 +n) = 6(1)5 (i — J). 


If the number of channels is equal to the downsampling factor, then a filter bank is unitary iff the 
corresponding transmultiplexer is unitary. 


The matrix characterization of unitary filter banks/transmultiplexers should be clear from the above 
discussion: 


Theorem 42 A filter bank is unitary iff H?H = I, and a transmultiplexer is unitary iff HH? = I. 
The z-transform domain characterization of unitary filter banks and transmultiplexers is given below: 


Theorem 43 A filter bank is unitary iff H} (*) H, (z) = I, and a transmultiplexer is unitary iff 
Hp@ He (2°) =. 


In this book (as in most of the work in the literature) one primarily considers the situation where the 
number of channels equals the downsampling factor. For such a unitary filter bank (transmultiplexer), 
[link] and [link] become: 

Equation: 


S°H/H; =I, 


4 


and 
Equation: 


T : ¢ 
H,H; =46(i—g)L 


The matrices H; are pairwise orthogonal and form a resolution of the identity matrix. In other words, for 
each i, H?'H; is an orthogonal projection matrix and the filter bank gives an orthogonal decomposition 


of a given signal. Recall that for a matrix P to be an orthogonal projection matrix, P? = P and P > 0; 
in our case, indeed, we do have H?H; > 0 and H?H;H/H; = H?H;. 


Unitarity is a very useful constraint since it leads to orthogonal decompositions. Besides, for a unitary 
filter bank, one does not have to design both the analysis and synthesis filters since h; (n) = g; (—n). 
But perhaps the most important property of unitary filter banks and transmultiplexers is that they can be 
parameterized. As we have already seen, filter bank design is a nonlinear optimization (of some goodness 
criterion) problem subject to PR constraints. If the PR constraints are unitary, then a parameterization of 
unitary filters leads to an unconstrained optimization problem. Besides, for designing wavelets with high- 
order vanishing moments, nonlinear equations can be formulated and solved in this parameter space. A 
similar parameterization of nonunitary PR filter banks and transmultiplexers seems impossible and it is 
not too difficult to intuitively see why. Consider the following analogy: a PR filter bank is akin to a left- 
invertible matrix and a PR transmultiplexer to a right-invertible matrix. If L = M, the PR filter bank is 
akin to an invertible matrix. A unitary filter bank is akin to a left-unitary matrix, a unitary 
transmultiplexer to a right-unitary matrix, and when L = M, either of them to a unitary matrix. Left- 
unitary, right-unitary and in particular unitary matrices can be parameterized using Givens' rotations or 
Householder transformations [link]. However, left-invertible, right-invertible and, in particular, invertible 
matrices have no general parameterization. Also, unitariness allows explicit parameterization of filter 
banks and transmultiplexers which just PR alone precludes. The analogy is even more appropriate: There 
are two parameterizations of unitary filter banks and transmultiplexers that correspond to Givens' rotation 
and Householder transformations, respectively. All our discussions on filter banks and transmultiplexers 
carry over naturally with very small notational changes to the multi-dimensional case where 


downsampling is by some integer matrix [link]. However, the parameterization result we now proceed to 
develop is not known in the multi-dimensional case. In the two-dimensional case, however, an implicit, 
and perhaps not too practical (from a filter-design point of view), parameterization of unitary filter banks 
is described in [link]. 


Consider a unitary filter bank with finite-impulse response filters (i.e., for all 7, h; is a finite sequence). 
Recall that without loss of generality, the filters can be shifted so that H, (z) is a polynomial in z~1. In 
this case G, (z) = H, (z~') is a polynomial in z. Let 

Equation: 


That is, H,, (z) is a matrix polynomial in z~* with coefficients h, (k) and degree K — 1. Since 

HP (z"!)H, (z) = I, from [link] we must have h?’ (0)h,p (K — 1) = Oas it is the coefficient of z<—1 
in the product H, (z~') Hp (z). Therefore hy (0) is singular. Let Px—1 be the unique projection matrix 
onto the range of hp (K — 1) (say of dimension 6x1). Then hp(0)* Px—1 = 0 = Px_1hy (0). Also 
Px_yh(K — 1) =h(K — 1) and hence (I — Px_1)h (K — 1) = 0. Now [I — Px_) + zPx_-1]A, (2) 
is a matrix polynomial of degree at most K — 2. If h(O) and h(.K — 1) are nonzero (an assumption one 
makes without loss of generality), the degree is preciselyk — 2. Also it is unitary since 

I — Px_1 + zPx_, is unitary. Repeated application of this procedure ( — 1) times gives a degree zero 
(constant) unitary matrix Vo. The discussion above shows that an arbitrary unitary polynomial matrix of 
degree kt — 1 can be expressed algorithmically uniquely as described in the following theorem: 


Theorem 44 For a polynomial matrix H, (z), unitary on the unit circle (i.e., Hy (z-!)H, (z) = D, and 
of polynomial degree K — 1, there exists a unique set of projection matrices P; (each of rank some 
integer 6;,), such that 

Equation: 


Hy (z) = ‘Hl 


[I — Py + 2" Py] yo 
1 


Si 


Remark: Since the projection P, is of rank dz, it can be written as vjv; +... + U5,U5.» for a nonunique 


set of orthonormal vectors v;. Using the fact that 
Equation: 


j-l1 


[Z = U;U; = Uj-1;-1 Lat (vjv; + U;-10;-1) | — Il [I — vt + z u,v; | : 
i<j 


defining A = 5°, 6; and collecting all the u,'s that define the P,'s into a single pool (and reindexing) we 
get the following factorization: 
Equation: 


Hy (2) = Il [Z — URUE + ZUR, | Yo 


k=A 


If H, (z) is the analysis bank of a filter bank, then notice that A (from [link]) is the number of storage 
elements required to implement the analysis bank. The minimum number of storage elements to 
implement any transfer function is called the McMillan degree and in this case A is indeed the McMillan 
degree [link]. Recall that Px is chosen to be the projection matrix onto the range of h, (K — 1). Instead 
we could have chosen Px to be the projection onto the nullspace of h, (0) (which contains the range of 
hy (K — 1)) or any space sandwiched between the two. Each choice leads to a different sequence of 
factors Px and corresponding 5, (except when the range and nullspaces in question coincide at some 
stage during the order reduction process). However, A, the McMillan degree is constant. 


Equation [link] can be used as a starting point for filter bank design. It parameterizes all unitary filter 
banks with McMillan degree A. If A = K, then all unitary filter banks with filters of length N << MK 
are parameterized using a collection of k — 1 unitary vectors, vz, and a unitary matrix, Vo. Each unitary 
vector has (MM — 1) free parameters, while the unitary matrix has M(M — 1)/2 free parameters for a 


total of (A — 1)(M-—1)+ 6 free parameters for H, (z). The filter bank design problem is to choose 


these free parameters to optimize the “usefulness” criterion of the filter bank. 


If L > M, and H, (z) is left-unitary, a similar analysis leads to exactly the same factorization as before 
except that Vo is a left unitary matrix. In this case, the number of free parameters is given by 


(kK -—1)(2-1)+ ( = 6} For a transmultiplexer with L < M, one can use the same factorization 


above for H . (z) (which is left unitary). Even for a filter bank or transmultiplexer with L = M, 
factorizations of left-/right-unitary H,, (z) is useful for the following reason. Let us assume that a subset 
of the analysis filters has been predesigned (for example in wavelet theory one sometimes independently 
designs ho to be a K-regular scaling filter, as in Chapter: Regularity, Moments, and Wavelet System 
Design ). The submatrix of H, (z) corresponding to this subset of filters is right-unitary, hence its 
transpose can be parameterized as above with a collection of vectors v; and a left-unitary Vo. Each choice 
for the remaining columns of Vo gives a choice for the remaining filters in the filter bank. In fact, all 
possible completions of the original subset with fixed McMillan degree are given this way. 


Orthogonal filter banks are sometimes referred to as lossless filter banks because the collective energy of 
the subband signals is the same as that of the original signal. If U is an orthogonal matrix, then the signals 
x(n) and Ux(n) have the same energy. If P is an orthogonal projection matrix, then 

Equation: 


2 2 2 
lIzI" = Pall” + | — Pall". 
For any give X(z), X(z) and z~'X (z) have the same energy. Using the above facts, we find that for any 


projection matrix, P, 
Equation: 


def 
D, (2) = |f-P+z°P|X, (2) =T(@)X,() 


has the same energy as X, (z). This is equivalent to the fact that T(z) is unitary on the unit circle (one 
can directly verify this). Therefore (from [link]) it follows that the subband signals have the same energy 
as the original signal. 


In order to make the free parameters explicit for filter design, we now describe Vo and {v;} using angle 
parameters. First consider v;, with ||v;|| = 1. Clearly, v; has (JM — 1) degrees of freedom. One way to 
parameterize v; using (IM — 1) angle parameters 6; ;,, k € {0,1,..., 4 — 2} would be to define the 
components of v; as follows: 


Equation: 
j-l 
Il sin (9:1) > cos (6;,;) for j € {0,1,..., M— 2} 
1=0 
(vi); gp 
Il sin (0; 1) for j= M-—1. 


As for Vo, it being an M x M orthogonal matrix, it has () degrees of freedom. There are two well 


known parameterizations of constant orthogonal matrices, one based on Givens' rotation (well known in 
QR factorization etc. [link]), and another based on Householder reflections. In the Householder 


parameterization 

Equation: 
M-1 
=|[[ I — 2v; Uy 
1=0 


where v; are unit norm vectors with the first 1 components of v; being zero. Each matrix factor 

[I — viv; | when multiplied by a vector q, reflects g about the plane perpendicular to v;, hence the 
name Householder reflections. Since the first ¢ components of v; is zero, and ||v;|| = 1, v; has 

M — i -—1 degrees of freedom. Each being a unit vector, they can be parameterized as before using 
M —i-— angles. Therefore, the total degrees of freedom are 

Equation: 


In summary, any orthogonal matrix can be factored into a cascade of M reflections about the planes 
perpendicular to the vectors v;. 


Notice the similarity between Householder reflection factors for Vo and the factors of H,, (z) in [link]. 
Based on this similarity, the factorization of unitary matrices and vectors in this section is called the 
Householder factorization. Analogous to the Givens' factorization for constant unitary matrices, also one 
can obtain a factorization of unitary matrices H, (z) and unitary vectors V(z)[link]. However, from the 
points of view of filter bank theory and wavelet theory, the Householder factorization is simpler to 
understand and implement except when M = 2. 


Perhaps the simplest and most popular way to represent a 2 x 2 unitary matrix is by a rotation parameter 
(not by a Householder reflection parameter). Therefore, the simplest way to represent a unitary 2 x 2 
matrix H, (z) is using a lattice parameterization using Given's rotations. Since two-channel unitary filter 
banks play an important role in the theory and design of unitary modulated filter banks (that we will 


shortly address), we present the lattice parameterization [link]. The lattice parameterization is also 
obtained by an order-reduction procedure we saw while deriving the Householder-type factorization in 
[link]. 


Theorem 45 Every unitary 2 x 2 matrix Hy (z) (in particular the polyphase matrix of a two channel FIR 
unitary filter bank) is of the form 
Equation: 


where 
Equation: 


cos@ sin @ 1 0 
n= | sin 8 cos 4 ane as 0 i 


Equation [link] is the unitary lattice parameterization of H, (z). The filters Ho (z) and Hj (z) are given 


by 
fino] = 2 @)[.} 


Equation: 
By changing the sign of the filter h (7), if necessary, one can always write H, (z) in the form 
Equation: 


A, (z) = R(0x-1)ZR (O9x-2)Z...2R (90). 


Now, if a (z) is the reflection of Ho, (z) (ie., Ho, (z) = 2 **'Ho,; (2~+)), then (from the algebraic 
form of R(@)) 
Equation: 


Hoo(z) Hos | _ | Hoo (2) Ho. (2) 
Fy (2) Ay (z) 7 —Hs (z) Ho (2) 


With these parameterizations, filter banks can be designed as an unconstrained optimization problem. The 
parameterizations described are important for another reason. It turns out that the most efficient (from the 
number of arithmetic operations) implementation of unitary filter banks is using the Householder 
parameterization. With arbitrary filter banks, one can organize the computations so as capitalize on the 
rate-change operations of upsampling and downsampling. For example, one need not compute values that 
are thrown away by downsampling. The gain from using the parameterization of unitary filter banks is 
over and above this obvious gain (for example, see pages 330-331 and 386-387 in [link]). Besides, with 
small modifications these parameterizations allow for unitariness to be preserved, even under filter 
coefficient quantization—with this having implications for fixed-point implementation of these filter 
banks in hardware digital signal processors [link]. 


Unitary Filter Banks—Some Illustrative Examples 


A few concrete examples of M/-band unitary filter banks and their parameterizations should clarify our 


discussion. 


First consider the two-band filter bank associated with Daubechies' four-coefficient scaling function and 


wavelet that we saw in Section: Parameterization of the Scaling Coefficients . Recall that the lowpass 
filter (the scaling filter) is given by 


n 0 
ho (n) 7a 


1 


34V3 


4/2 


3 


1-3 
4/2 * 


The highpass filter (wavelet filter) is given by hy (n) = (—1)"ho (3 — n), and both [link] and [link] are 


by 
Equation: 


d= Hx = 


1-3 


3-3 


3473 


14-73 


Wi 9B 0 0 
_—14+V73 34+V3 _ 3-vV3) 1-3 0 0 
4/2 4/2 4/2 4/2 
0 0 1-3 3-3 34V3 1473 
4/2 4/2 4/2 4/2 
0 0 itv 3+V3 __ 33-V3 1-v3 


4/2 


4/2 


4/2 


satisfied with g; (n) = h; (—n). The matrix representation of the analysis bank of this filter bank is given 


One readily verifies that H’ H = I and HH? = TI. The polyphase representation of this filter bank is 


given by 
Equation: 


1 (1 
Hy (2) = AV (3 


and one can show that H,’ (z~*) Hy (z) = I and H, (z) 


of H, (z) is given by 
Equation: 


v3) 


v3) 


H, (2) 


T 
A, 


z1(3-v3) (3+v3) +27 (1- v3) 
z (1 - v3) (-3+ v3) - 2+ (1+ v3) 


(z"*) = I. The Householder factorization 


[Z —vyvi + z'vy07 | Vo, 


where 
Equation: 


7 [ee (x/12) 


se taia)} and Yo= 


1/V2  1/v2 
1/v2 -1/V2] 


Incidentally, all two-band unitary filter banks associated with wavelet tight frames have the same value of 
Vo. Therefore, all filter banks associated with two-band wavelet tight frames are completely specified by 
a set of orthogonal vectors v;, AK — 1 of them if the hg is of length 2K. Indeed, for the six-coefficient 
Daubechies wavelets (see Section: Parameterization of the Scaling Coefficients ), the parameterization of 
H, (z) is associated with the following two unitary vectors (since K = 3): v} = [—. 3842. 9232] and 

vi = [—. 1053—. 9944]. 


The Givens' rotation based factorization of H, (z) for the 4-coefficient Daubechies filters given by: 
Equation: 


H, (2) cos 0) z ‘sin a | cos 6, sin ‘| 
z)= 
. —sin 0) z ‘cos 69||—sin 6, cos 0,|’ 
where 09 = 3 and 6; = —7,. The fact that the filter bank is associated with wavelets is precisely 


because 49 + 6; = 7. More generally, for a filter bank with filters of length 2 to be associated with 


wavelets, sa 6;, = {- This is expected since for filters of length 2K to be associated with wavelets 
we have seen (from the Householder factorization) that there are K — 1 parameters vz. Our second 
example belongs to a class of unitary filter banks called modulated filter banks, which is described in a 
following section. A Type 1 modulated filter bank with filters of length N = 2M and associated with a 
wavelet orthonormal basis is defined by 

Equation: 


where i € {0,...,M —1}andn € {0,..., 2M — 1}[link], [link]. Consider a three-band example with 
length six filters. In this case, kK = 2, and therefore one has one projection P, and the matrix Vp. The 
projection is one-dimensional and given by the Householder parameter 

Equation: 


1 11 1 


1 1 3- 
w= —/2| and Vo = — ee 1 — 


The third example is another Type 1 modulated filter bank with M/Z = 4 and N = 8. The filters are given 
in [link]. H, (z) had the following factorization 
Equation: 


Hy (z) = [I-Pit+27'Pi]Vo, 


where P, is a two-dimensional projection P,; = vyur + vue (notice the arbitrary choice of v; and v2) 


given by 
Equation: 
0. 41636433418450 0. 00000000000000 
as —0. 78450701561376 oo —0. 14088210492943 
_— 0. 32495344564406 |? 7? 0. 50902478635868 
0. 32495344564406 —0. 84914427477499 
and 
Equation: 
1 1 1 iN 
ese -/2 0 v2 0 
(= = ea 
210 v2 0 -Vv2 
1 -1 1 -1 


Notice that there are infinitely many choices of v, and wv, that give rise to the same projection P,. 


M-band Wavelet Tight Frames 


In [link], Theorem 7, while discussing the properties of I/-band wavelet systems, we saw that the 
lowpass filter ho (A in the notation used there) must satisfy the linear constraint » ho(n) = VM. 
n 


Otherwise, a scaling function with nonzero integral could not exit. It turns out that this is precisely the 
only condition that an FIR unitary filter bank has to satisfy in order for it to generate an M/-band wavelet 
system [link], [link]. Indeed, if this linear constraint is not satisfied the filter bank does not generate a 


wavelet system. This single linear constraint (for unitary filter banks) also implies that Ss h; (n) = 0 for 
h 

4 € {1,2,..., M — 1} (because of Eqn. [link]). We now give the precise result connecting FIR unitary 

filter banks and wavelet tight frames. 


Theorem 46 Given an FIR unitary filter bank with x ho (n) = VM, there exists an unique, compactly 
n 


supported, scaling function 7p (t) € L? (IR) (with support in [0, | , assuming ho is supported in 
[0, N — 1]) determined by the scaling recursion: 
Equation: 


bo (t) = VM S~ ho (k)h0 (Mt — k). 


k 


Define wavelets, yw; (t), 
Equation: 


i (t) =VM “Ai (bdo (Mt—k) i € {1,2,...,M—1}, 
k 


and functions, ~i,;,% (t), 
Equation: 


Wije (t) = M9/?ep; (Mt — k). 


Then {wi,;,4} forms a tight frame for L? (R). That is, for all f € L? (R) 


Equation: 
M-1 ow 
FO=>) Yd) Fvise dian ©. 
i=1 j,k=—00 
Also, 
Equation: 
M-1 


>» a (Ff, Dian) Wiga (t)- 


t=1 j=l k=— 


f (t) = yi (Ff, Yo,0,k) Po,0,k (t) + 
k 


Remark: A similar result relates general FIR (not necessarily unitary) filter banks and /-band wavelet 
frames [link], [link], [link]. 


Starting with [link], one can calculate the scaling function using either successive approximation or 
interpolation on the M-adic rationals—i.e., exactly as in the two-band case in Chapter [link]. 

Equation [link] then gives the wavelets in terms of the scaling function. As in the two-band case, the 
functions ~; (¢), so constructed, invariably turn out highly irregular and sometimes fractal. The solution, 
once again, is to require that several moments of the scaling function (or equivalently the moments of the 
scaling filter hg) are zero. This motivates the definition of K-regular M-band scaling filters: A unitary 
scaling filter ho is said to be K regular if its &-transform can be written in the form 

Equation: 


for maximal possible kK’. By default, every unitary scaling filter ho is one-regular (because 

yn he (a) = VM - see [link], Theorem [link] for equivalent characterizations of K-regularity). Each of 
the K-identical factors in Eqn. [link] adds an extra linear constraint on Ao (actually, it is one linear 
constraint on each of the M polyphase subsequences of ho - see [link]). 


There is no simple relationship between the smoothness of the scaling function and K-regularity. 
However, the smoothness of the maximally regular scaling filter, ho, with fixed filter length N, tends to 
be an increasing function of NV. Perhaps one can argue that K-regularity is an important concept 
independent of the smoothness of the associated wavelet system. K-regularity implies that the moments 
of the wavelets vanish up to order K — 1, and therefore, functions can be better approximated by using 
just the scaling function and its translates at a given scale. Formulae exist for //-band maximally regular 


k-regular scaling filters (i.e., only the sequence ho) [link]. Using the Householder parameterization, one 
can then design the remaining filters in the filter bank. 


The linear constraints on ho that constitute K-regularity become nonexplicit nonlinear constraints on the 
Householder parameterization of the associated filter bank. However, one-regularity can be explicitly 
incorporated and this gives a parameterization of all IM-band compactly supported wavelet tight frames. 
To see, this consider the following two factorizations of H, (z) of a unitary filter bank. 

Equation: 


1 
H, (z= |[ F-P+2'P]Vo, 
k=K-1 


and 
Equation: 


1 
Eis (z) = II [T- Q,+27*Q:]Wo. 
k=K-1 


Since H, (1) = Vo and HY (1) = Wo, Vo = W,. The first column of Wy is the unit vector 


[Hoo (1), Ho, (1), ..-, Ho,ar—1 (1)]”. Therefore, 
Equation: 


M-1 
SS Box)? = 1. 
k=0 


But since ps ho (n) = Ho (1) = VM, 


Equation: 


M-1 
S-> Hox (1) = VM. 
k=0 


Wink 


other words, a unitary filter bank gives rise to a WTF iff the first row of Vo in the Householder 


Therefore, for all k, Ho, (1) = —. Hence, the first row of Vo is [t/vM, LV Mya LV M). In 


parameterization is the vector with all entries 1/ VM. 


Alternatively, consider the Given's factorization of H, (z) for a two-channel unitary filter bank. 


Equation: 
1 —-1la: ‘ 
cos 6; z ~ sin 6; cos 9) sin A 
A, (z) = : 
» (2) ‘ I [ sin 0; z ! cos ‘| b sin 9 cos 4% 


Since for a WTF we require 


Equation: 


Ve V| _ [Hoo (1) es oie () sin (0) 
5 5 Hoi (1) Ai (1) —sin (9) cos (@)|’ 


we have 90 = Par 0; = 74. This is the condition for the lattice parameterization to be associated with 
wavelets. 


Modulated Filter Banks 


Filter bank design typically entails optimization of the filter coefficients to maximize some goodness 
measure subject to the perfect reconstruction constraint. Being a constrained (or unconstrained for 
parameterized unitary filter bank design) nonlinear programming problem, numerical optimization leads 
to local minima, with the problem exacerbated when there are a large number of filter coefficients. To 
alleviate this problem one can try to impose structural constraints on the filters. For example, if [link] is 
the desired ideal response, one can impose the constraint that all analysis (synthesis) filters are obtained 
by modulation of a single “prototype” analysis (synthesis) filter. This is the basic idea behind modulated 
filter banks [link], [link], [link], [link], [link], [link], [link], [link]. In what follows, we only consider the 
case where the number of filters is equal to the downsampling factor; ie., LD = M. 


The frequency responses in [link] can be obtained by shifting an the response of an ideal lowpass filter 
(supported in |— 33, , 54, ]) by (+ +) 7.7 € {0, ..., M — 1}. This can be achieved by modulating 
with a cosine (or sine) with appropriate frequency and arbitrary phase. However, some choices of phase 
may be incompatible with perfect reconstruction. A general choice of phase (and hence modulation, that 
covers all modulated filter banks of this type) is given by the following definition of the analysis and 
synthesis filters: 

Equation: 


hs (n) = h(n) cos ((i =) (n s)), i € R(M) 


Equation: 


a(n) = a(n) cos (= (i =) (n s)), i € R(M) 


Here a is an integer parameter called the modulation phase. Now one can substitute these forms for the 
filters in [link] to explicit get PR constraints on the prototype filters h and g. This is a straightforward 
algebraic exercise, since the summation over 2 in [link] is a trigonometric sum that can be easily 
computed. It turns out that the PR conditions depend only on the parity of the modulation phase a. Hence 
without loss of generality, we choose a € {M — 1, M — 2}—other choices being incorporated as a 
preshift into the prototype filters h and g. 


Thus there are two types of MFBs depending on the choice of modulation phase: 
Equation: 


M-—1 Type 1 Filter Bank 
a= 
M-—2 Type 2 Filter Bank 


The PR constraints on h and g are quite messy to write down without more notational machinery. But the 
basic nature of the constraints can be easily understood pictorially. Let the M polyphase components of h 
and g respectively be partitioned into pairs as suggested in [link]. Each polyphase pair from A and an 
associated polyphase pair g (i.e., those four sequences) satisfy the PR conditions for a two-channel filter 
bank. In other words, these subsequences could be used as analysis and synthesis filters respectively in a 
two-channel PR filter bank. As seen in [link], some polyphase components are not paired. The constraints 
on these sequences that PR imposes will be explicitly described soon. Meanwhile, notice that the PR 
constraints on the coefficients are decoupled into roughly 7/2 independent sets of constraints (since 
there are roughly M//2 PR pairs in [link]). To quantify this, define J: 

Equation: 


4 Type 1, M even 
j= aS Type 1, M odd 
abe Type 2, M even 

M-1 


7 Type 2, M odd. 


In other words, the MFB PR constraint decomposes into a set of J two-channel PR constraints and a few 
additional conditions on the unpaired polyphase components of h and g. 


(d) Type 2 MFB - M odd 


Two-Channel PR Pairs ina PR MFB 


We first define MZ polyphase components of the analysis and synthesis prototype filters, viz., P; (z) and 
Q(z) respectively. We split these sequences further into their even and odd components to give P;o (z), 
Pi (z), Qi0 (z) and Q71 (z) respectively. More precisely, let 


Equation: 
M-1 M-1 
HZ) = Sz 'P, (z Zz "(Pro (z ~~) + zMPi (2")), 
1=0 1=0 


Equation: 


M-1 


M-1 
G(z) = S> Qi (z™ = SZ ( Qio (2 ") + 2% Qi (2")), 


l=0 l=0 


and let 
Equation: 


- P10 (z) Pri (z) 
a (2) 7 Ee (z) —fa-bl (z) 


with Q(z) defined similarly. Let Y be the 2 x 2 identity matrix. 


Theorem 47 (Modulated Filter Banks PR Theorem) An modulated filter bank (Type 1 or Type 2) (as 
defined in [link] and [link]) is PR iff for | © BJ) 
Equation: 


Pi (2) Bi (2) = 


and furthermore if a is even P29 (z)Q 2,0 (z) = qz- In the Type 2 case, we further require 
Py-1 (2)Qu-1 (2) = #. 


The result says that P;, Pa—1, Q; and Q,_; form analysis and synthesis filters of a two-channel PR filter 
bank ({link] in Z-transform domain). 


Modulated filter bank design involves choosing h and g to optimize some goodness criterion while 
subject to the constraints in the theorem above. 


Unitary Modulated Filter Bank 


In a unitary bank, the filters satisfy g; (n) = h; (—n). From [link] and [link], it is clear that in a 
modulated filter bank if g(n) = h(—n), then g; (n) = h; (—n). Imposing this restriction (that the 
analysis and synthesis prototype filters are reflections of each other) gives PR conditions for unitary 
modulated filter banks. That g(n) = h(—n) means that P; (z) = Q; (z~1) and therefore 

Q(z) = A, (z"). Indeed, for PR, we require 

Equation: 


P() PF (21) =. 


This condition is equivalent to requiring that P; and P,—; are analysis filters of a two-channel unitary 
filter bank. Equivalently, for? € Z(M), Py and P;,1 are power-complementary. 


Corollary 6 (Unitary MFB PR Theorem) A modulated filter bank (Type 1 or Type 2) is unitary iff for 
| € &(J), Pro (z) and P,, (z) are power complementary. 
Equation: 


2 
Pro (2) Pio (z") + Pa (Pui (=) = a! € &(M) 


Furthermore, when a is even P29 (z)P29 (z~') = 3p (i.e., P20 (z) has to be yi for some integer 
k). In the Type 2 case, we further require Pyy_1 (z)Pu-i (z*) = 4 (i.e., Pui (z) has to be Fae 


for some integer k). 


Unitary modulated filter bank design entails the choice of h, the analysis prototype filter. There are J 
associated two-channel unitary filter banks each of which can be parameterized using the lattice 
parameterization. Besides, depending on whether the filter is Type 2 and/or alpha is even one has to 
choose the locations of the delays. 


For the prototype filter of a unitary MFB to be linear phase, it is necessary that 
Equation: 


P,-1 (z) = zg 2k+] p, @*), 


for some integer k. In this case, the prototype filter (if FIR) is of length 2k and symmetric about 

(M k- 5) in the Type 1 case and of length 2Mk — 1 and symmetric about (Mk — 1) (for both Class A 
and Class B MFBs). In the FIR case, one can obtain linear-phase prototype filters by using the lattice 
parameterization [link] of two-channel unitary filter banks. Filter banks with FIR linear-phase prototype 
filters will be said to be canonical. In this case, P; (z) is typically a filter of length 2k for all J. For 
canonical modulated filter banks, one has to check power complementarity only for 1 € &(J). 


Modulated Wavelet Tight Frames 


For all M, there exist M-band modulated WTFs. The simple linear constraint on hg becomes a set of J 
linear constraints, one each, on each of the J two-channel unitary lattices associated with the MFB. 


Theorem 48 (Modulated Wavelet Tight Frames Theorem) Every compactly supported modulated 
WTF is associated with an FIR unitary MFB and is parameterized by J unitary lattices such that the sum 
of the angles in the lattices satisfy (for 1 © &(J)) Eqn. [link]. 


Equation: 


def 
Yb = O= awe (< i), 
- 4 2M \2 


If a canonical MFB has Jk parameters, the corresponding WTF has J(k — 1) parameters. 


Notice that even though the PR conditions for MFBs depended on whether it is Type 1 or Type 2, the 
MWTEFE conditions are identical. Now consider a Type 1 or Type 2 MFB with one angle parameter per 
lattice; i.e., NM = 2M (Type 1) or N = 2M — 1 (Type 2). This angle parameter is specified by the 
MWTF theorem above if we want associated wavelets. This choice of angle parameters leads to a 
particularly simple form for the prototype filter. 


In the Type 1 case [link], [link], 
Equation: 


and therefore 
Equation: 


hi (n) = aaron (ste) (2% 4 7) sin (= (2% 4 nz)], 


In the Type 2 case [link], 
Equation: 


and hence 
Equation: 


hat) = gue si (ae (2% 4 7) sin a (24 4 ale 


Linear Phase Filter Banks 


In some applications. it is desirable to have filter banks with linear-phase filters [link]. The linear-phase 
constraint (like the modulation constraint studied earlier) reduces the number of free parameters in the 
design of a filter bank. Unitary linear phase filter banks have been studied recently [link], [link]. In this 
section we develop algebraic characterizations of certain types of linear filter banks that can be used as a 
starting point for designing such filter banks. 


In this section, we assume that the desired frequency responses are as in [link]. For simplicity we also 
assume that the number of channels, M/, is an even integer and that the filters are FIR. It should be 
possible to extend the results that follow to the case when M is an odd integer in a straightforward 
manner. 


Consider an M/-channel FIR filter bank with filters whose passbands approximate ideal filters. Several 
transformations relate the M/ ideal filter responses. We have already seen one example where all the ideal 
filters are obtained by modulation of a prototype filter. We now look at other types of transformations that 
relate the filters. Specifically, the ideal frequency response of hyy_1_; can be obtained by shifting the 
response of the h; by z. This either corresponds to the restriction that 

Equation: 


hu-1-i(n) =(-1)"hi(n) 5 Hu-1-i(z) = Hi(-z) 3 Hu-1-i(w) = Hi(wt+n), 


or to the restriction that 
Equation: 


hui (n) = (-1)"&i(N-1-n) ; Hu-ii(2) =H (-z) ; Hui) = Hy wt+n) 


where NV is the filter length and for polynomial H(z), H® (z) denotes its reflection polynomial (i.e. the 
polynomial with coefficients in the reversed order). The former will be called pairwise-shift (or PS) 
symmetry (it is also known as pairwise-mirror image symmetry [link]) , while the latter will be called 
pairwise-conjugated-shift (or PCS) symmetry (also known as pairwise-symmetry [link]). Both these 
symmetries relate pairs of filters in the filter bank. Another type of symmetry occurs when the filters 
themselves are symmetric or linear-phase. The only type of linear-phase symmetry we will consider is of 
the form 

Equation: 


hy(n) =+h;(N-—1-—n) ; H,(z) =+H# (2), 


where the filters are all of fixed length NV, and the symmetry is about ait. For an M-channel linear- 


phase filter bank (with M an even integer), M/2 filters each are even-symmetric and odd-symmetric 
respectively [link]. 


We now look at the structural restrictions on H, (z), the polyphase component matrix of the analysis 
bank that these three types of symmetries impose. Let J denote the exchange matrix with ones on the 
antidiagonal. Postmultiplying a matrix A by J is equivalent to reversing the order of the columns of A, 
and premultiplying is equivalent to reversing the order of the rows of A. Let V denote the sign- 
alternating matrix, the diagonal matrix of alternating -++1's. Postmultiplying by V, alternates the signs of 
the columns of A, while premultiplying alternates the signs of the rows of A. The polyphase components 
of H(z) are related to the polyphase components of H m (z) by reflection and reversal of the ordering of 


M-1 
the components. Indeed, if H(z) is of length Mm, and H (z) = >» 2H, ae i then, 
1=0 


Equation: 
M-1 
H® (z) = g Mmtd S- 2H, (oe) 
1=0 
M-1 
= S- so) (2M Fy, (z™)) 
1=0 
M-1 
= 2 Age 1-4 (2™). 
1=0 
Therefore 
Equation: 


(H®), (2) =(Hu-)* (2) 


and for linear-phase H(z), since H® (z) = +H (z), 
Equation: 


Lemma 2 For even M, H,, (z) is of the form 


e PS Symmetry 


Equation: 
Wo (2) Wi (2) 7 E 1 Wo (2) Wi (2) 
IWo(2z)V (-1)“? IW (z)V| LO J) |Wo(2)\ve (-1)"? MW (2)V 
e PCS Symmetry 
Equation: 
W, (z) W, (z)J 7 F i W, (2) W, (z)J 
IWF (z)V (-1)“? Iw (z)JV} 10 JI WE (ZV (-1)“? WF (2) IV 
e Linear Phase 
Equation: 
Wo(z) DoW§*(z)J| — |Wo(z) DoW,' (2) El 4 
Wi(z) DiWF(z)J| | Wi(z) DiWF(z)| 0 J 
Eade 


Wo(z) Weis _ | Wo (2) We (z) | [7 0 
¢ i (z) _we ue o| ; | | | 


e Linear Phase and PCS 


Equation: 
Wo (z) DWE (z)J i 1 Wo (z) DWE (z)J 
IDWo(2)V (-1)“? IwF(2IV| 10 J] | DWo(z)V (-1)“? Ww (ZIV 
e Linear Phase and PS 
Equation: 
Wo (z) DW£ (z)J E al Wo (z) DW £ (z)J 
IWo(z)V (-1)“? pw (2)Jv| 0 JD] |DWo(2)\v (-1)“? WF (2)JIV 


Thus in order to generate H, (z) for all symmetries other than PS, we need a mechanism that generates 
a pair of matrices and their reflection (i.e., Wo (z), Wi (z)W# (z) and Wj (z)). In the scalar case, there 
are two well-known lattice structures, that generate such pairs. The first case is the orthogonal lattice 
[link], while the second is the linear-prediction lattice [link]. A kK th order orthogonal lattice is generated 
by the product 

Equation: 


Tee eee ee, BO aaa 


This lattice is always invertible (unless a; and 6; are both zero!), and the inverse is anticausal since 
Equation: 


a; zd; oo. 1 a; —b; 
—b; z ‘a; - a? + b? 2b; za; | 


As we have seen, this lattice plays a fundamental role in the theory of two-channel FIR unitary modulated 
filter banks. The hyperbolic lattice of order K generates the product 


Equation: 
elt S]ERG ko) fx 


where Yo (z) and Yj (z) are of order K. This lattice is invertible only when a? # 6? (or equivalently 
(a; + 6;)/2 and (a; — b;)/2 are nonzero) in which case the inverse is noncausal since 
Equation: 


ai 0; 
Since the matrix : i can be orthogonal iff {a;, b;} = {+1, 0}, or {a;, b;} = {0, +1}, the (2 x 2) 
i Aj 
matrix generated by the lattice can never be unitary. 


Formally, it is clear that if we replace the scalars a; and b; with square matrices of size M/2 x M/2 then 
we would be able to generate matrix versions of these two lattices which can then be used to generate 
filter banks with the symmetries we have considered. We will shortly see that both the lattices can 
generate unitary matrices, and this will lead to a parameterization of FIR unitary H, (z) for PCS, linear- 
phase, and PCS plus linear-phase symmetries. We prefer to call the generalization of the orthogonal 
lattice, the antisymmetric lattice and to call the generalization of the hyperbolic lattice, the symmetric 
lattice, which should be obvious from the form of the product. The reason for this is that the 
antisymmetric lattice may not generate a unitary matrix transfer function (in the scalar case, the 2 x 2 
transfer function generated is always unitary). The antisymmetric lattice is defined by the product 


Equation: 
det { A; zB; Ay Bo 
os Hi, alee A 


i=1 


where A; and B; are constant square matrices of size M/2 x M/2. It is readily verified that X(z) is of 
the form 
Equation: 


Yo(z) Yi (2) | 


oe ve (2) YE (2) 


Given X(z), its invertibility is equivalent to the invertibility of the constant matrices, 


Equation: 
—B, A; ak —B; z 1A; — |-B;, Ail lO 271]’ 


which, in turn is related to the invertibility of the complex matrices C'; = (A; + 1B;) and 


D; = (A; — 1Bj), since, 
2|r -}|0 DiJ|I Wil |-B; Aj] 


Equation: 
Moreover, the orthogonality of the matrix is equivalent to the unitariness of the complex matrix C;; (since 
D; is just its Hermitian conjugate). Since an arbitrary complex matrix of size M/2 x M/2 is determined 


A 
by precisely 2 e parameters, each of the matrices fs B, i 


Clearly when these matrices are orthogonal X(z) is unitary (on the unit circle) and X? (z~1) X (z) = I. 
For unitary X(z) the converse is also true as will be shortly proved. 


| has that many degrees of freedom. 


The symmetric lattice is defined by the product 


Equation: 
def i A; zB; Ao Bo 
xX — 
eo ‘Tf alle i. 


Once again A; and B; are constant square matrices, and it is readily verified that X(z) written as a 
product above is of the form 
Equation: 


Yo(z) Yi (2) | 


aa Ln (2) YF (2) 


The invertibility of X(z) is equivalent to the invertibility of 
Equation: 
B, Aj B, z'A;} [Bi Ai} lO 271Z]’ 


which in turn is equivalent to the invertibility of C; = (A; + B;) and D; = (A; — B;) since 
Equation: 


1jl ©i|C; Oj}|F IF] {Ay B; 
20 -I],0 D JW -I] |B Ail’ 
The orthogonality of the constant matrix is equivalent to the orthogonality of the real matrices C’; and D,, 


and since each real orthogonal matrix of size M/2 x M/2 is determined by (4) parameters, the 


constant orthogonal matrices have 2 (4) degrees of freedom. Clearly when the matrices are orthogonal 


x? (2 yx (z) = I. For the hyperbolic lattice too, the converse is true. 


We now give a theorem that leads to a parameterization of unitary filter banks with the symmetries we 
have considered (for a proof, see [link]). 


Theorem 49 Let X(z) be a unitary M x M polynomial matrix of degree K. Depending on whether 
X(z) is of the form in [link], or [link], it is generated by an order Kantisymmetric or symmetric lattice. 


Characterization of Unitary H, (z) — PS Symmetry 


The form of H, (z) for PS symmetry in [link] can be simplified by a permutation. Let P be the 
permutation matrix that exchanges the first column with the last column, the third column with the last 
but third, etc. That is, 

Equation: 


Oo 
Oo 
oo 
o 
e 


010. 0 0 
0 0 1 0 0 
r= : 
001... 0 0 
000... 01 0 
100... 0 0 
z W, (z ! Ww! 
Then the matrix 02) M é ) in [link] can be rewritten as —— : (2) "| P, 
Wolz)V (-1)° Wi (2)V v2 |—Wi(z) Wi (z) 


and therefore 
Equation: 


H, (2) 


I 
| Bananas: | 
on 


i Wo (z) Wi (z) 
J] ]Wo(z)V (-1)"? wi (2)V 


0 Lm wel 


| 
= alo alee fo” weal” 
[2 ee 0 |p 


Val-J JIL 0 Wiz) 


For PS symmetry, one has the following parameterization of unitary filter banks. 


Theorem 50 (Unitary PS Symmetry), (z) of order K forms a unitary PR filter bank with PS 
symmetry iff there exist unitary, order K, M/2 x M/2 matrices W, (z) and Wj (z), such that 
Equation: 


A unitary H,, with PS symmetry is determined by precisely 2 (M/2 — 1) (Lo + Li) +2 () 


parameters where Ly > K and L; > K are the McMillan degrees of W; (z) and W; (z) respectively. 


Characterization of Unitary H, (z) — PCS Symmetry 


In this case 
Equation: 


I 0]| Wo(z) Wi (2) J 
Pe (z)V (-1)"? WE (2) JV 
def [I 0 Wo Wid 
| lee hae 


0 H er ors f iJ? 


Hence from Lemma "Linear Phase Filter Banks", (z) of unitary filter banks with PCS symmetry can be 
parameterized as follows: 


| 
| rene | 
Ny 
Oo 


Theorem 51 H, (z) forms an order K, unitary filter bank with PCS symmetry iff 
Equation: 


wo=[ MUS, ALS, all a 


B; 
B 4 are constant orthogonal matrices. H, (z) is characterized by 2K e parameters. 
ii AG 


a 


where 


Characterization of Unitary H, (z) — Linear-Phase Symmetry 


For the linear-phase case, 
Equation: 


HiS-< ae WH (2) 


Wi(z) —Wi (2) 


bo 
0 J 
I al (z) + Wi (2) i es ; 


I -I}|Wo(2)—Wilz) WE(2z)+WR(z2)| lO J 


at 1 (1 yl Wie) Wiz) ]Ir 0 
v2 f 4 ine aie F i} 


1 
aa 


Therefore, we have the following Theorem: 


Theorem 52 H,, (z) of order K, forms a unitary filter bank with linear-phase filters iff 


Equation: 
1 ff I]fyfAi 21Bi]| [Ao Bo] fl 0 
Hy (z) = ==) = I] Ai ’ 
A; Bi ; / 
where BA, are constant orthogonal matrices. H, (z) is characterized by 2K 9 parameters. 


Characterization of Unitary H, (z) — Linear Phase and PCS Symmetry 


In this case, H,, (z) is given by 
Equation: 


fr ojf Wol2) DW (z) J 

Hyta) = F | Paw (-1)"?W2 (ZIV 
def a | W3(z) D(Wi)* (ZI 
V210 JI|—Dwo(z) (Wy)* (z)J 


7 el 4 Ee A ee ‘i: (2) 


iP 


I 0 
P; 
0 J 
Therefore we have proved the following Theorem: 


Theorem 53 H,, (z) of order K forms a unitary filter bank with linear-phase and PCS filters iff there 
exists a unitary, order K, M/2 x M/2 matrix W, (z) such that 


Equation: 
Hy) =| jp A mie ae A ae 


In this case H, (z) is determined by precisely (M/2—1)LI + (4) parameters where L > K is the 
McMillan degree of W, (z). 


Characterization of Unitary H, (z) — Linear Phase and PS Symmetry 
From the previous result we have the following result: 


Theorem 54H, (z) of order K forms a unitary filter bank with linear-phase and PS filters iff there exists 
a unitary, order K, M/2 x M/2 matrix Wg (z) such that 


Equation: 
l= aly op les ww al oP 


H, is determined by precisely (M/2 —1)L+ (4) parameters where L > K is the McMillan degree 
of W5 (2). 


Notice that Theorems ''Characterization of Unitary H p (z) — PS Symmetry" through 

Theorem ''Characterization of Unitary H p (z) — Linear Phase and PS Symmetry"' give a completeness 
characterization for unitary filter banks with the symmetries in question (and the appropriate length 
restrictions on the filters). However, if one requires only the matrices W; (z) and Wj (z) in the above 
theorems to be invertible on the unit circle (and not unitary), then the above results gives a method to 
generate nonunitary PR filter banks with the symmetries considered. Notice however, that in the 
nonunitary case this is not a complete parameterization of all such filter banks. 


Linear-Phase Wavelet Tight Frames 


A necessary and sufficient condition for a unitary (FIR) filter bank to give rise to a compactly supported 
wavelet tight frame (WTF) is that the lowpass filter ho in the filter bank satisfies the linear constraint 
[link] 

Equation: 


S > ho (n) = VM. 


We now examine and characterize how H, (z) for unitary filter banks with symmetries can be 
constrained to give rise to wavelet tight frames (WTFs). First consider the case of PS symmetry in which 
case H, (z) is parameterized in [link]. We have a WTF iff 

Equation: 


first row of A, (z)|,_, = [1/VM ... 1/vVM]. 


In [link], since P permutes the columns, the first row is unaffected. Hence [link] is equivalent to the first 
rows of both W; (z) and W{ (z) when z = 1 is given by 
Equation: 


[VM ... JM. 


This is precisely the condition to be satisfied by a WTF of multiplicity /2. Therefore both W, (z) and 
W{ (z) give rise to multiplicity 14/2 compactly supported WTFs. If the McMillan degree of W; (z) and 
) are Lo and Ly respectively, then they are parameterized respectively by 


Wi (z 
/2— /2— : 
5 + (M/2 —1)L and ° + (M/2 — 1), parameters. In summary, a WTF with PS 


2— 
symmetry can be explicitly parameterized by 2 « } + (M/2 — 1) (Lo + Lz) parameters. Both Lo 


and L, are greater than or equal to Kk. 


PS symmetry does not reflect itself as any simple property of the scaling function wo (¢) and wavelets 
w; (t),2 € {1,..., M — 1} of the WTF. However, from design and implementation points of view, PS 
symmetry is useful (because of the reduction in the number of parameters). 


Next consider PCS symmetry. From [link] one sees that [link] is equivalent to the first rows of the 
matrices A and B defined by 
Equation: 


are of the form li / VM... 1 / af M| . Here we only have an implicit parameterization of WTFs, unlike 


the case of PS symmetry. As in the case of PS symmetry, there is no simple symmetry relationships 
between the wavelets. 


Now consider the case of linear phase. In this case, it can be seen [link] that the wavelets are also linear 


phase. If we define 
AB) fy [Ai Bi 
B A} )ALLB Ail {? 


Equation: 
then it can be verified that one of the rows of the matrix A + B has to be of the form 
[./ 2/M ... /2/M | . This is an implicit parameterization of the WTF. 


Finally consider the case of linear phase with PCS symmetry. In this case, also the wavelets are linear- 
phase. From [link] it can be verified that we have a WTF iff the first row of Wj (z) for z = 1, evaluates 


to the vector e /2/M ... 4/2/M | . Equivalently, W; (z) gives rise to a multiplicity M/2 WTF. In 


this case, the WTF is parameterized by precisely @ 9 } + (M/2 —1)L parameters where L > K is 
the McMillan degree of Wg (z). 


Linear-Phase Modulated Filter Banks 
The modulated filter banks we described 


1. have filters with nonoverlapping ideal frequency responses as shown in [link]. 
2. are associated with DCT III/IV (or equivalently DST III/IV) in their implementation 
3. and do not allow for linear phase filters (even though the prototypes could be linear phase). 


In trying to overcome 3, Lin and Vaidyanathan introduced a new class of linear-phase modulated filter 
banks by giving up 1 and 2 [link]. We now introduce a generalization of their results from a viewpoint 
that unifies the theory of modulated filter banks as seen earlier with the new class of modulated filter 
banks we introduce here. For a more detailed exposition of this viewpoint see [link]. 


The new class of modulated filter banks have 2M analysis filters, but 1Z bands—each band being shared 
by two overlapping filters. The MW bands are the /-point Discrete Fourier Transform bands as shown in 
[link]. 

Equation: 


; -\% n € {0, M} 


1 otherwise. 


Two broad classes of MFBs (that together are associated with all four DCT/DSTs [link]) can be defined. 


3m T 0 T 3m 


2M 2M ~ 2M 2M 


Ww 


Ideal Frequency Responses in an M-band DFT-type Filter Bank 


DCT/DST I/II based 2// Channel Filter Bank 


Equation: 
h; (n) = kjh(n) cos (=i (n = =)), ie Sy 
Equation: 
huts (n) = kh (n — M) sin (= (n = ar ie Sy 
Equation: 
gi (n) = hig (n) cos (3 (n + a) ic Sy 
Equation: 


gm+i(n) = —kig(n+ M) sin (3 (n+ $)): 1€ So 


The sets Sj and S> are defined depending on the parity of a as shown in [link]. When a is even (i.e., 
Type 1 with odd M or Type 2 with even M), the MFB is associated with DCT I and DST I. When a is 
odd (i.e., Type 1 with even M or Type 2 with odd M), the MFB is associated with DCT II and DST II. 
The linear-phase MFBs introduced in [link] correspond to the special case where h(n) = g(n) and a is 
even. The other cases above and their corresponding PR results are new. 


Si So 
a even, DCT/DST I 2(M)U {mM} 2(M)\{0} 
a odd, DCT/DST II RM) #(M)\{0} U{M} 


Class A MFB: The Filter Index Sets S; and S» 


The PR constraints on the prototype filters h and g (for both versions of the filter banks above) are 
exactly the same as that for the modulated filter bank studied earlier [link]. When the prototype filters are 
linear phase, these filter banks are also linear phase. An interesting consequence is that if one designs an 
M-channel Class B modulated filter bank, the prototype filter can also be used for a Class A 2M channel 
filter bank. 


Linear Phase Modulated Wavelet Tight Frames 


Under what conditions do linear phase modulated filter banks give rise to wavelet tight frames (WTFs)? 
To answer this question, it is convenient to use a slightly different lattice parameterization than that used 
for Class B modulated filter banks. A seemingly surprising result is that some Class A unitary MFBs 
cannot be associated with WTFs. More precisely, a Class A MFB is associated with a WTF only if it is 


Type 1. 
Equation: 
Pro(z) Pi (2) rae 
: =V47 Il T’ (01x) pT" (910) 
—P,,; (2) Pio (z) k=k)—1 
where 
Equation: 


T' (6) = | 


cos 0;, z sin Ae 


—sin6,, z 1 cos 01% 


With this parameterization we define ©; as follows: 


Equation: 
P,o (1) PE (1) a | cos (9;) sin | 
Pi) Pe) ~ |—sin (Q;) cos (@;)]’ 
kj-1 
where in the FIR case 9; = be 61,, as before. Type 1 Class A MFBs give rise to a WTF iff O; = - for 
k=0 
alle BJ). 


Theorem 55 (Modulated Wavelet Tight Frames Theorem) A class A MFB of Type 1 gives rise to a 
WIF iff 0; = 7. Aclass B MFB (Type 1 or Type 2) gives rise to a WTF iff 0; = 7 + ria Gi = l). 


Time-Varying Filter Bank Trees 


Filter banks can be applied in cascade to give a rich variety of decompositions. By appropriately 
designing the filters one can obtain an arbitrary resolution in frequency. This makes them particularly 
useful in the analysis of stationary signals with phenomena residing in various frequency bands or scales. 
However, for the analysis of nonstationary or piecewise stationary signals such filter bank trees do not 
suffice. With this in mind we turn to filter banks for finite-length signals. 


If we had filter bank theory for finite-length signals, then, piecewise stationary signals can be handled by 
considering each homogeneous segment separately. Several approaches to filter banks for finite-length 
signals exist and we follow the one in [link]. If we consider the filter bank tree as a machine that takes an 
input sample(s) and produces an output sample(s) every instant then one can consider changing machine 
every instant (i.e., changing the filter coefficients every instant). Alternatively, we could use a fixed 
machine for several instants and then switch to another filter bank tree. The former approach is 
investigated in [link]. We follow the latter approach, which, besides leveraging upon powerful methods to 
design the constituent filter bank trees switched between, also leads to a theory of wavelet bases on an 
interval [link], [link]. 


Let H, (z), the polyphase component matrix of the analysis filter bank of a unitary filter bank be of the 


form (see [link]) 
Equation: 

K-1 

Hy (2) = Shp (B)2* 

k=0 
It is convenient to introduce the sequence 
x = |---,2(0),2(—1),---,a(—M-+1),2(M),z(M — 1),---] obtained from x by a permutation. 
Then, 
Equation: 

d — Hx = hp (K — 1) hp (K — 2) hp (0) 0 X, 


and His unitary iff His unitary. [link] induces a factorization of H (and hence H). If Wo = diag (Vo) 
and 
Equation: 


Vi a i eT I, 


Equation: 


The factors V;, with appropriate modifications, will be used as fundamental building blocks for filter 
banks for finite-length signals. 


Now consider a finite input signal x = [a(0), #(1),---,#(Z — 1)], where L is a multiple of M and let 
x = [x(M —1),---,x(0),2(M),---,x(L—1),---,2(L—M)]. Then, the finite vector d (the 
output signal) is given by 


Equation: 
hy (K — 1) she hy (0) 0 is sis a 0 
0 hp (=v 4 ~ hg {O) 
d=Hx 
; sa 0 hy (K — 1) ts hy (0) 0 
0 ai ie ae 0 hp(K-1) «: hp (0) 


His an (LZ —N+M) x L matrix, where N = MK is the length of the filters. Now since the rows of 
H are mutually orthonormal (i.e., has rank DZ), one has to append N — M = M(K — 1) rows from the 


orthogonal complement of H to make the map from x to an augmented d unitary. To get a complete 
description of these rows, we turn to the factorization of H, (z). Define the L x L matrix 

Vo = diag (Vo) and for: € {1,..., K — 1} the (Z — Mt) x (ZL — Mi + M) matrices 

Equation: 


P; I—P; 0 aks ses 0 


0 PP; L—P; 0 


0 
then H1 is readily verified by induction to be I] V;. Since each of the factors (except Vo) has M 


i=K-1 
B; 
more columns than rows, they can be made unitary by appending appropriate rows. Indeed, } V;]} is 
Ci 


unitary where, B; = [tF (Py) Dh ee 0], and C; = [0 vee OO) =, Pj. . Here = is the 6; x M 
left unitary matrix that spans the range of the Pj; i.e., P; = BE", and Y is the (M — 6;) x M left 
unitary matrix that spans the range of the J — P;;i.e., J — P; = 6 ig Clearly |[Y;;] is unitary. 
Moreover, if we define To = Vo and fori € {1,..., kK — 1}, 

Equation: 


rake 


ooo Oo 
oOo oOo O&O 


T5,(¢-1) 
then each of the factors T; is a square unitary matrix of size LD — N + M and 
Equation: 


BiVo 


CiVo 


is the unitary matrix that acts on the data. The corresponding unitary matrix that acts on x (rather than x) 


U 
is of the form | H }, where U has MK — M — A rows of entry filters in (Kk _ 1) sets given by [link], 
W 
while W has A rows of exit filters in (KC — 1) given by [link]: 
Equation: 
Y;(I—Pi)[hgG-1)F pe G-2)F .- 5 (0)J,] 
Equation: 


B;Pj[hg(j-1)J hb (G—2)7 ... AZ (0)J], 


where J is the exchange matrix (i.e., permutation matrix of ones along the anti-diagonal) and 
Equation: 


E ei : 
H3(z)= [J 2-Pi+z'P]V . Sod (27. 
i=0 


i=j-1 


The rows of U and W form the entry and exit filters respectively. Clearly they are nonunique. The 
input/output behavior is captured in 
Equation: 


For example, in the four-coefficient Daubechies' filters in [link] case, there is one entry filter and exit 
filter. 
Equation: 


0.8660 0.5000 0 0 
—0.1294 0.2241 0.8365 0.4830 
—0.4830 0.8365 —0.2241 —0.1294]° 

0 0 —0.5000 0.8660 


If the input signal is right-sided (i.e., supported in {0, 1, ...}), then the corresponding filter bank would 
only have entry filters. If the filter bank is for left-sided signals one would only have exit filters. Based on 
the above, we can consider switching between filter banks (that operate on infinite extent input signals). 
Consider switching from a one-channel to an M channel filter bank. Until instant n = —1, the input is 
the same as the output. At n = 0, one switches into an M-channel filter bank as quickly as possible. The 
transition is accomplished by the entry filters (hence the name entry) of the /-channel filter bank. The 
input/output of this time-varying filter bank is 


Equation: 
I 0 
d= |0 U)}x. 
0 H 
Next consider switching from an M-channel filter bank to a one-channel filter bank. Until m = —1, the 


M-channel filter bank is operational. From m = 0 onwards the inputs leaks to the output. In this case, 
there are exit filters corresponding to flushing the states in the first filter bank implementation at n = 0. 
Equation: 


H 0O 
d= |W O|{x. 
0 é6dI 


Finally, switching from an M,-band filter bank to an M -band filter bank can be accomplished as 
follows: 


Equation: 
H, 0 
WwW 
d= 1 0 x. 
0 U2 
0 Hae 


The transition region is given by the exit filters of the first filter bank and the entry filters of the second. 
Clearly the transition filters are abrupt (they do not overlap). One can obtain overlapping transition filters 
Wi O 
as follows: replace them by any orthogonal basis for the row space of the matrix : U,|" For 
2 
example, consider switching between two-channel filter banks with length-4 and length-6 Daubechies' 
filters. In this case, there is one exit filter (W 1) and two entry filters (U2). 


Growing a Filter Bank Tree 


Consider growing a filter bank tree at n = 0 by replacing a certain output channel in the tree (point of 
tree growth) by an M channel filter bank. This is equivalent to switching from a one-channel to an M- 
channel filter bank at the point of tree growth. The transition filters associated with this change are related 
to the entry filters of the M-channel filter bank. In fact, every transition filter is the net effect of an entry 
filter at the point of tree growth seen from the perspective of the input rather than the output point at 
which the tree is grown. Let the mapping from the input to the output “growth” channel be as shown in 
[link]. The transition filters are given by the system in [link], which is driven by the entry filters of the 
newly added filter bank. Every transition filter is obtained by running the corresponding time-reversed 
entry filter through the synthesis bank of the corresponding branch of the extant tree. 


Pruning a Filter Bank Tree 


In the more general case of tree pruning, if the map from the input to the point of pruning is given as in 
[link], then the transition filters are given by [link]. 


A Branch of an Existing Tree 


Wavelet Bases for the Interval 

By taking the effective input/output map of an arbitrary unitary time-varying filter bank tree, one readily 
obtains time-varying discrete-time wavelet packet bases. Clearly we have such bases for one-sided and 
finite signals also. These bases are orthonormal because they are built from unitary building blocks. We 


now describe the construction of continuous-time time-varying wavelet bases. What follows is the most 
economical (in terms of number of entry/exit functions) continuous-time time-varying wavelet bases. 


Transition Filter For Tree Growth 


Wavelet Bases for L? ([0, 00)) 


Recall that an M channel unitary filter bank (with synthesis filters {h;}) such that 5°, ho (n) = VM 
gives rise to an M-band wavelet tight frame for L? (IR). If 
Equation: 


Wig = Span{eizn} = {mi?p, (Mit — k)} for k € Z, 


then Wo; forms a multiresolution analysis of L? (R) with 
Equation: 


Wo; = Woj-1 @ Wi j-1-- @ Wwu-1j-1 Vj EZ. 


In [link], Daubechies outlines an approach due to Meyer to construct a wavelet basis for L? (0, 00)). 
One projects Wo; onto Wos! which is the space spanned by the restrictions of wo,;,x (t) tot > 0. We 
give a different construction based on the following idea. For k € IN, support of 7;,;,z (t) is in [0, 00). 
With this restriction (in [link]) define the spaces Ww; As j — 00 (since Wo; + L? (R)) 

Wai; — L? ([0, 00)). Hence it suffices to have a multiresolution 


Transition Filter For Pruning 


analysis for W', to get a wavelet basis for L? ({0, c0)). [link] does not hold with W;,; replaced by AP 
because Woy is bigger than the direct sum of the constituents at the next coarser scale. Let Y_ be this 


difference space: 
Equation: 


Wai = Woy <p) Wii @ Wrr-1j1 @ Wa 


If we can find an orthonormal basis for %;, then we have a multiresolution analysis for L? ((0, 00)). 


We proceed as follows. Construct entry filters (for the analysis filters) of the filter bank with synthesis 
filters {h;}. Time-reverse them to obtain entry filters (for the synthesis filters). If A is the McMillan 
degree of the synthesis bank, there are A entry filters. Let u; (n) denote the i‘” synthesis entry filters. 
Define the entry functions 

Equation: 


Ij-1 
y(t) = VM S © uy (k)bo (Mt — k) ,1 € {0,..., A — 1}. 
k=0 


def 
pu; (t) is compactly supported in |0, a + $4 . Let Y = Span{pz;} = Span{ M3? y; (Mit)}. 


By considering one stage of the analysis and synthesis stages of this PR filter bank on right sided signals), 
it readily follows that [link] holds. Therefore 
Equation: 


{bije |i € (1,..,.M—-1,5¢Z,k CN} U{u,; | 2 € {0,..,A—-1},j eZ} 


forms a wavelet tight frame for L? ([0, 00)). If one started with an ON basis for L? (IR), the newly 
constructed basis is an ON basis for L? ([0, 00)). Indeed if {49 (¢ — k)} is an orthonormal system 
Equation: 


/ yu (t)i (t — n) dt = Sui (k) hi (MI-+ b) =0, 
t>0 k 


and 
Equation: 


t 


[#1 (btn (8) dt = Yu (i (8) = 0. 
2 k 


The dimension of @; is precisely the McMillan degree of the polyphase component matrix of the scaling 
and wavelet filters considered as the filters of the synthesis bank. There are precisely as many entry 
functions as there are entry filters, and supports of these functions are explicitly given in terms of the 
lengths of the corresponding entry filters. [link] shows the scaling function, wavelet, their integer 
translates and the single entry function corresponding to Daubechies four coefficient wavelets. In this 


case, Up = {-v3/2, 1/2}. 


Wavelet Bases for L? ((—oo, 0]) 


One could start with a wavelet basis for L? ([0, 00)) and reflect all the functions about t = 0. This is 
equivalent to swapping the analysis and synthesis filters of the filter bank. We give an independent 
development. We start with a WTF for L? (IR) with functions 

Equation: 


N-1 
Wi (t) = VM S~ hi (k) yo (Mt + k), 
k=0 


supported in [- a, 0| . Scaling and wavelet filters constitute the analysis bank in this case. Let A be 
the McMillan degree of the analysis bank and let {w;} be the (analysis) exit filters. Define the exit 
functions 


Equation: 


Ij-1 
4 (t) = VM S~ w; (k) bo (Mt + k),1 € {0,..., A — 1}. 
k=0 


; : def : : 
4; = Span{ M3/?y; (M?t)}, and W;; = Span{pizn} = { M5/?oh; (Mit + k) } for k € IN. Then as 
jr7o,Wy;> L? ((—o0, 0]) and 
Equation: 


{ign | € {1,...,.M—1},7€Z,4E N}U{y,, | 7 € {0,...,A—1},7 € Z} 


forms a WTF for L? ((—oo, 0]). Orthonormality of this basis is equivalent to the orthonormality of its 
parent basis on the line. An example with one exit function (corresponding to M = 3, N = 6) Type 1 
modulated WTF obtained earlier is given in [link]. 


Nv 


(a) Entry function po (t), Wo (€) and wo (t — 1) (b) Wavelet 7; (t) and 
¥ (t— 1) 


Segmented Time-Varying Wavelet Packet Bases 


Using the ideas above, one can construct wavelet bases for the interval and consequently segmented 
wavelet bases for L? (IR). One can write R as a disjoint union of intervals and use a different wavelet 
basis in each interval. Each interval is will be spanned by a combination of scaling functions, wavelets, 
and corresponding entry and exit functions. For instance [link] and [link] together correspond to a 
wavelet basis for L? (IR), where a 3-band wavelet basis with length-6 filters is used for t < 0 and a 2- 
band wavelet basis with length-4 filters is used for ¢ > 0. Certainly a degree of overlap between the exit 
functions on the left of a transition and entry functions on the right of the transition can be obtained by 


merely changing coordinates in the finite dimensional space corresponding to these functions. Extension 
of these ideas to obtain segmented wavelet packet bases is also immediate. 


(a) Exit function v9 (t), Wo (t) and wo (t + 1) (b) Wavelet w1 (t) and 
1 (+ 1) (©) Wavelet we (t) and we (t + 1) 


Filter Banks and Wavelets—Summary 


Filter banks are structures that allow a signal to be decomposed into subsignals—typically at a lower 
sampling rate. If the original signal can be reconstituted from the subsignals, the filter bank is said to be a 
perfect reconstruction (PR) filter bank. For PR, the analysis and synthesis filters have to satisfy a set of 
bilinear constraints. These constraints can be viewed from three perspectives, viz., the direct, matrix, and 
polyphase formulations. In PR filter bank design one chooses filters that maximize a “goodness” criterion 
and satisfy the PR constraints. 


Unitary filter banks are an important class of PR filter banks—they give orthogonal decompositions of 
signals. For unitary filter banks, the PR constraints are quadratic in the analysis filters since the synthesis 
filters are index-reverses of the analysis filters. All FIR unitary filter banks can be explicitly 
parameterized. This leads to easy design (unconstrained optimization) and efficient implementation. 
Sometimes one can impose structural constraints compatible with the goodness criterion. For example, 
modulated filter banks require that the analysis and synthesis filters are modulates of single analysis and 
synthesis prototype filter respectively. Unitary modulated filter banks exist and can be explicitly 
parameterized. This allows one to design and implement filter banks with hundreds of channels easily and 
efficiently. Other structural constraints on the filters (e.g., linear phase filters) can be imposed and lead to 
parameterizations of associated unitary filter banks. Cascades of filter banks (used in a tree structure) can 
be used to recursively decompose signals. 


Every unitary FIR filter bank with an additional linear constraint on the lowpass filter is associated with a 
wavelet tight frame. The lowpass filter is associated with the scaling function, and the remaining filters 


are each associated with wavelets. The coefficients of the wavelet expansion of a signal can be computed 
using a tree-structure where the filter bank is applied recursively along the lowpass filter channel. The 
parameterization of unitary filter banks, with a minor modification, gives a parameterization of all 
compactly supported wavelet tight frames. In general, wavelets associated with a unitary filter bank are 
irregular (i.e., not smooth). By imposing further linear constraints (regularity constraints) on the lowpass 
filter, one obtains smooth wavelet bases. Structured filter banks give rise to associated structured wavelet 
bases; modulated filter banks are associated with modulated wavelet bases and linear phase filter banks 
are associated with linear-phase wavelet bases. Filter banks cascade—where all the channels are 
recursively decomposed, they are associated with wavelet packet bases. 


From a time-frequency analysis point of view filter banks trees can be used to give arbitrary resolutions 
of the frequency. In order to obtain arbitrary temporal resolutions one has to use local bases or switch 
between filter bank trees at points in time. Techniques for time-varying filter banks can be used to 
generate segmented wavelet bases (i.e., a different wavelet bases for disjoint segments of the time axis). 
Finally, just as unitary filter banks are associated with wavelet tight frames, general PR filter banks, with 
a few additional constraints, are associated with wavelet frames (or biorthogonal bases). 


Calculation of the Discrete Wavelet Transform 


Although when using the wavelet expansion as a tool in an abstract mathematical analysis, the infinite sum and the 
continuous description of ¢ € R are appropriate, as a practical signal processing or numerical analysis tool, the 
function or signal f(t) in [link] is available only in terms of its samples, perhaps with additional information such 
as its being band-limited. In this chapter, we examine the practical problem of numerically calculating the discrete 
wavelet transform. 


Finite Wavelet Expansions and Transforms 


The wavelet expansion of a signal f(¢) as first formulated in [link] is repeated here by 
Equation: 


1O= SY Fbsavae 


k=—00 j=—00 


where the {7;; (¢)} form a basis or tight frame for the signal space of interest (e.g., L2). At first glance, this 
infinite series expansion seems to have the same practical problems in calculation that an infinite Fourier series or 
the Shannon sampling formula has. In a practical situation, this wavelet expansion, where the coefficients are 
called the discrete wavelet transform (DWT), is often more easily calculated. Both the time summation over the 
index k and the scale summation over the index j can be made finite with little or no error. 


sin(t) 
t 


The Shannon sampling expansion [link], [link] of a signal with infinite support in terms of sinc(t) = 
expansion functions 
Equation: 


fn= 3 f (Tn) sinc (at - m) 


n=—0oo 


requires an infinite sum to evaluate f(t) at one point because the sinc basis functions have infinite support. This is 
not necessarily true for a wavelet expansion where it is possible for the wavelet basis functions to have finite 
support and, therefore, only require a finite summation over k in [link] to evaluate f(t) at any point. 


The lower limit on scale 7 in [link] can be made finite by adding the scaling function to the basis set as was done in 
[link]. By using the scaling function, the expansion in [link] becomes 
Equation: 


co 


FO) = So Pena) era (t) + S- Ss; (fF, Pin) Pin (t)- 


k=—0o k=—co j=Jo 


where j = Jo is the coarsest scale that is separately represented. The level of resolution or coarseness to start the 
expansion with is arbitrary, as was shown in Chapter: A multiresolution formulation of Wavelet Systems in [link], 
[link], and [link]. The space spanned by the scaling function contains all the spaces spanned by the lower 
resolution wavelets from 7 = —oo up to the arbitrary starting point 7 = Jo. This means 

Vi, = W_x~ ® ++: ® W;,_1. Ina practical case, this would be the scale where separating detail becomes 
important. For a signal with finite support (or one with very concentrated energy), the scaling function might be 
chosen so that the support of the scaling function and the size of the features of interest in the signal being 
analyzed were approximately the same. 


This choice is similar to the choice of period for the basis sinusoids in a Fourier series expansion. If the period of 
the basis functions is chosen much larger than the signal, much of the transform is used to describe the zero 
extensions of the signal or the edge effects. 


The choice of a finite upper limit for the scale 7 in [link] is more complicated and usually involves some 
approximation. Indeed, for samples of f(t) to be an accurate description of the signal, the signal should be 
essentially bandlimited and the samples taken at least at the Nyquist rate (two times the highest frequency in the 
signal's Fourier transform). 


The question of how one can calculate the Fourier series coefficients of a continuous signal from the discrete 
Fourier transform of samples of the signal is similar to asking how one calculates the discrete wavelet transform 
from samples of the signal. And the answer is similar. The samples must be “dense" enough. For the Fourier series, 
if a frequency can be found above which there is very little energy in the signal (above which the Fourier 
coefficients are very small), that determines the Nyquist frequency and the necessary sampling rate. For the 
wavelet expansion, a scale must be found above which there is negligible detail or energy. If this scale is 7 = Jj, 
the signal can be written 


Equation: 
[oe 
FO SS Fens) ere (t) 
k=—oo 
or, in terms of wavelets, [link] becomes 
Equation: 
ee) co Ji- 
f(t) ad S> (f, Pao) PJo,k ( + Sr oe Wik )di,k ( t). 
k=—0o k=—oo j=Jo 


This assumes that approximately f € Vj, or equivalently, || f—P,,f || 0, where Py, denotes the orthogonal 
projection of f onto Vj,. 


Given f (t) € ¥j, so that the expansion in [link] is exact, one computes the DWT coefficients in two steps. 


1. Projection onto finest scale: Compute (f, YJ, x) 
2. Analysis: Compute (f, ;%), 7 € {Jo,---, J1 — 1} and (f, yy,,4). 


For J; large enough, yz, x (t) can be approximated by a Dirac impulse at its center of mass since / yp (:) dt=1 


. For large 7 this gives 
Equation: 


2 f F()¢ (2’t) dt = [ros 6 (t —2-4mpo) dt = f (t —2-/mo) 


where mg = / t y (t) dt is the first moment of y(t). Therefore the scaling function coefficients at the 7 = Jy 


scale are 
Equation: 


cn (B) = (fren) = 2 fF) (at) d= 2% f f(t 2-%R) @ (2%) at 


which are approximately 
Equation: 


cy, (k) 7 (f, PJ,,k) od mk J pe (mo + k)). 


For all 2-regular wavelets (i.e., wavelets with two vanishing moments, regular wavelets other than the Haar 
wavelets—even in the M/-band case where one replaces 2 by M in the above equations, mp = 0), one can show 
that the samples of the functions themselves form a third-order approximation to the scaling function coefficients 
of the signal [link]. That is, if f(t) is a quadratic polynomial, then 

Equation: 


en (k) = (f, pre) = 2-7/7 f (2771 (mo + hy) 2-77 Ff (2-4). 


Thus, in practice, the finest scale J; is determined by the sampling rate. By rescaling the function and amplifying 
it appropriately, one can assume the samples of f(t) are equal to the scaling function coefficients. These 
approximations are made better by setting some of the scaling function moments to zero as in the coiflets. These 
are discussed in Section: Approximation of Scaling Coefficients by Samples of the Signal . 


Finally there is one other aspect to consider. If the signal has finite support and L samples are given, then we have 
L nonzero coefficients (f, p7,,,). However, the DWT will typically have more than L coefficients since the 
wavelet and scaling functions are obtained by convolution and downsampling. In other words, the DWT of a L- 
point signal will have more than L points. Considered as a finite discrete transform of one vector into another, this 
situation is undesirable. The reason this “expansion" in dimension occurs is that one is using a basis for L? to 
represent a signal that is of finite duration, say in L? (0, P}. 


When calculating the DWT of a long signal, Jo is usually chosen to give the wavelet description of the slowly 
changing or longer duration features of the signal. When the signal has finite support or is periodic, Jo is generally 
chosen so there is a single scaling coefficient for the entire signal or for one period of the signal. To reconcile the 
difference in length of the samples of a finite support signal and the number of DWT coefficients, zeros can be 
appended to the samples of f(t) or the signal can be made periodic as is done for the DFT. 


Periodic or Cyclic Discrete Wavelet Transform 


If f(¢) has finite support, create a periodic version of it by 
Equation: 


F(t) = Sos (t+ Pn) 


where the period P is an integer. In this case, (f, yj.) and (f, Wj.) are periodic sequences in k with period P 2/ 
(if 7 > O and 1 if 7 < 0) and 
Equation: 


ag, k)= 2" / f(t) (2% — k) dt 
Equation: 


d (j,k) = 29/2 / f(t +2-%k) a (2°t) dt = 29/? / f(t +2740) (2°t) dt 


where £ =< k > pa; (k modulo P 2) andl € {0, 1, ..., P27 — in? An obvious choice for Jo is 1. Notice that in 
this case given L — 27" samples of the signal, (f; P.,,k), the wavelet transform has exactly 
14+14+2+42?+4.-.421-1 — 24 — F terms. Indeed, this gives a linear, invertible discrete transform which 
can be considered apart from any underlying continuous process similar the discrete Fourier transform existing 
apart from the Fourier transform or series. 


There are at least three ways to calculate this cyclic DWT and they are based on the equations [link], [link], and 
[link] later in this chapter. The first method simply convolves the scaling coefficients at one scale with the time- 
reversed coefficients h(—n) to give an L + N — 1 length sequence. This is aliased or wrapped as indicated in 
[link] and programmed in dwt 5 .m in Appendix 3. The second method creates a periodic é; (k) by concatenating 
an appropriate number of c; (k) sections together then convolving h(n) with it. That is illustrated in [link] and in 


dwt .m in Appendix 3. The third approach constructs a periodic h (n) and convolves it with c; (k) to implement 
[link]. The Matlab programs should be studied to understand how these ideas are actually implemented. 


Because the DWT is not shift-invariant, different implementations of the DWT may appear to give different results 
because of shifts of the signal and/or basis functions. It is interesting to take a test signal and compare the DWT of 
it with different circular shifts of the signal. 


Making f(t) periodic can introduce discontinuities at 0 and P. To avoid this, there are several alternative 
constructions of orthonormal bases for L? (0, P|{link], [link], [link], [link]. All of these constructions use (directly 
or indirectly) the concept of time-varying filter banks. The basic idea in all these constructions is to retain basis 
functions with support in [0, P], remove ones with support outside [0, P] and replace the basis functions that 
overlap across the endpoints with special entry/exit functions that ensure completeness. These boundary functions 
are chosen so that the constructed basis is orthonormal. This is discussed in Section: Time-Varying Filter Bank 
Trees . Another way to deal with edges or boundaries uses “lifting" as mentioned in Section: Lattices and Lifting. 


Filter Bank Structures for Calculation of the DWT and Complexity 


Given that the wavelet analysis of a signal has been posed in terms of the finite expansion of [link], the discrete 
wavelet transform (expansion coefficients) can be calculated using Mallat's algorithm implemented by a filter bank 
as described in Chapter: Filter Banks and the Discrete Wavelet Transform and expanded upon in _Chapter: Filter 
Banks and Transmultiplexers . Using the direct calculations described by the one-sided tree structure of filters and 
down-samplers in Figure: Three-Stage Two-Band Analysis Tree allows a simple determination of the 
computational complexity. 


If we assume the length of the sequence of the signal is LD and the length of the sequence of scaling filter 
coefficients h(n) is N, then the number of multiplications necessary to calculate each scaling function and wavelet 
expansion coefficient at the next scale, c(.J; — 1, k) and d(J; — 1,k), from the samples of the signal, 

f (Tk)  c(Ji, k), is LN. Because of the downsampling, only half are needed to calculate the coefficients at the 
next lower scale, c(J2 — 1,k) and d(J2 — 1,k), and repeats until there is only one coefficient at a scale of 7 = Jo. 
The total number of multiplications is, therefore, 

Equation: 


Mult = IN+LIN/2+2IN/4+---+N 
Equation: 


= EN(1+1/2+1/4+--:+1/L) =2NL—N 


which is linear in Z and in N. The number of required additions is essentially the same. 


If the length of the signal is very long, essentially infinity, the coarsest scale Jo must be determined from the goals 
of the particular signal processing problem being addressed. For this case, the number of multiplications required 
per DWT coefficient or per input signal sample is 

Equation: 


Mult/sample = N(2 — ao) 


Because of the relationship of the scaling function filter h(n) and the wavelet filter hy (m) at each scale (they are 
quadrature mirror filters), operations can be shared between them through the use of a lattice filter structure, which 


will almost halve the computational complexity. That is developed in Chapter: Filter Banks and Transmultiplexers 
and [link]. 


The Periodic Case 


In many practical applications, the signal is finite in length (finite support) and can be processed as single “block," 
much as the Fourier Series or discrete Fourier transform (DFT) does. If the signal to be analyzed is finite in length 
such that 


Equation: 
0 t<0 
f(t)= 0 aw 
fi) V<t<P 


we can construct a periodic signal f (t) by 
Equation: 


Ft) = DoF (t+ Pn) 


and then consider its wavelet expansion or DWT. This creation of a meaningful periodic function can still be done, 
even if f(t) does not have finite support, if its energy is concentrated and some overlap is allowed in [link]. 


Periodic Property 1: If f (¢) is periodic with integer period P such that f (t) = f (t + Pn), then the scaling 
function and wavelet expansion coefficients (DWT terms) at scale J are periodic with period 2/ P. 
Equation: 


If f(t)=f(t+P) then d;(k) =d;(k+2°P) 


This is easily seen from 
Equation: 


ai) = f- F(t) w (2t—K) de = [ F(t+ Pn) wy (2t—¥) de 


which, with a change of variables, becomes 
Equation: 


= [ Fe) ¥ (2 (@— Pn) —k) de = f F(c)¥ (24x - (2/Pn + k)) da = d; (k + 2/Pn) 


and the same is true for ¢; (k). 


Periodic Property 2: The scaling function and wavelet expansion coefficients (DWT terms) can be calculated 
from the inner product of f(t) with y(t) and 2(t) or, equivalently, from the inner product of f(t) with the 
periodized G (t) and w(t). 

Equation: 


&(k) = (F(), 9 (t)) = (F FO) 


and 
Equation: 


d;(k) = (F(),¥®) = (F090) 
where G (t) = D>, y(t + Pn) and b(t) = 1, Y(t + Pn). 


This is seen from 
Equation: 


=f For (2’t — k) dt = Sf rev (2? (t+ Pn) —k) dt = [se pa (2/ (t+ Pn) —k) 


Equation: 


f(t) b (2%t — k) dt 
y= fs k) 


where wp (2/t — k) = 33, v (2? (t + Pn) — k) is the periodized scaled wavelet. 


Periodic Property 3: If f(t) is periodic with period P, then Mallat's algorithm for calculating the DWT 
coefficients in [link] becomes 
Equation: 


6) (k) = Sh (mm — 2k) Bs (m) 


m 


or 
Equation: 


or 
Equation: 


where for [link] 
Equation: 


cj (k) = Sh (m — 2k) e541 (m) 


m 


The corresponding relationships for the wavelet coefficients are 
Equation: 


= Sohi(m — 2k) G41 (m) = pat m — 2k) cj41 (m) 


or 
Equation: 


d; (k) = S> d; (k + 2’Pn) 


where 
Equation: 


d;(k) = S~ hy (m — 2k) e+: (m) 


m 


These are very important properties of the DWT of a periodic signal, especially one artificially constructed from a 
nonperiodic signal in order to use a block algorithm. They explain not only the aliasing effects of having a periodic 
signal but how to calculate the DWT of a periodic signal. 


Structure of the Periodic Discrete Wavelet Transform 


If f(¢) is essentially infinite in length, then the DWT can be calculated as an ongoing or continuous process in 
time. In other words, as samples of f(t) come in at a high enough rate to be considered equal to cz; (k), scaling 
function and wavelet coefficients at lower resolutions continuously come out of the filter bank. This is best seen 
from the simple two-stage analysis filter bank in Section: Three-Stage Two-Band Analysis Tree . If samples come 
in at what is called scale J; = 5, wavelet coefficients at scale 7 = 4 come out the lower bank at half the input rate. 
Wavelet coefficients at 7 = 3 come out the next lower bank at one quarter the input rate and scaling function 
coefficients at 7 = 3 come out the upper bank also at one quarter the input rate. It is easy to imagine more stages 
giving lower resolution wavelet coefficients at a lower and lower rate depending on the number of stages. The last 
one will always be the scaling function coefficients at the lowest rate. 


For a continuous process, the number of stages and, therefore, the level of resolution at the coarsest scale is 
arbitrary. It is chosen to be the nature of the slowest features of the signals being processed. It is important to 
remember that the lower resolution scales correspond to a slower sampling rate and a larger translation step in the 
expansion terms at that scale. This is why the wavelet analysis system gives good time localization (but poor 
frequency localization) at high resolution scales and good frequency localization (but poor time localization) at 
low or coarse scales. 


For finite length signals or block wavelet processing, the input samples can be considered as a finite dimensional 
input vector, the DWT as a square matrix, and the wavelet expansion coefficients as an output vector. The 
conventional organization of the output of the DWT places the output of the first wavelet filter bank in the lower 
half of the output vector. The output of the next wavelet filter bank is put just above that block. If the length of the 
signal is two to a power, the wavelet decomposition can be carried until there is just one wavelet coefficient and 
one scaling function coefficient. That scale corresponds to the translation step size being the length of the signal. 
Remember that the decomposition does not have to carried to that level. It can be stopped at any scale and is still 
considered a DWT, and it can be inverted using the appropriate synthesis filter bank (or a matrix inverse). 


More General Structures 


The one-sided tree structure of Mallet's algorithm generates the basic DWT. From the filter bank in Section: Three- 
Stage Two-Band Analysis Tree , one can imagine putting a pair of filters and downsamplers at the output of the 
lower wavelet bank just as is done on the output of the upper scaling function bank. This can be continued to any 
level to create a balanced tree filter bank. The resulting outputs are “wavelet packets" and are an alternative to the 
regular wavelet decomposition. Indeed, this “growing” of the filter bank tree is usually done adaptively using some 
criterion at each node to decide whether to add another branch or not. 


Still another generalization of the basic wavelet system can be created by using a scale factor other than two. The 
multiplicity-M scaling equation is 
Equation: 


g(t) = > h(k) e (Mt —k) 


k 


and the resulting filter bank tree structure has one scaling function branch and M — 1 wavelet branches at each 
stage with each followed by a downsampler by M. The resulting structure is called an M-band filter bank, and it 


Scaling Functions and Wavelets . 


In many applications, it is the continuous wavelet transform (CWT) that is wanted. This can be calculated by using 
numerical integration to evaluate the inner products in [link] and [link] but that is very slow. An alternative is to 
use the DWT to approximate samples of the CWT much as the DFT can be used to approximate the Fourier series 
or integral [link], [link], [link]. 


As you can see from this discussion, the ideas behind wavelet analysis and synthesis are basically the same as 
those behind filter bank theory. Indeed, filter banks can be used calculate discrete wavelet transforms using 
Mallat's algorithm, and certain modifications and generalizations can be more easily seen or interpreted in terms of 
filter banks than in terms of the wavelet expansion. The topic of filter banks in developed in Chapter: Filter Banks 
and the Discrete Wavelet Transform and in more detail in Chapter: Filter Banks and Transmultiplexers . 


Wavelet-Based Signal Processing and Applications 


This chapter gives a brief discussion of several areas of application. It is 
intended to show what areas and what tools are being developed and to give 
some references to books, articles, and conference papers where the topics can 
be further pursued. In other words, it is a sort of annotated bibliography that 
does not pretend to be complete. Indeed, it is impossible to be complete or up- 
to-date in such a rapidly developing new area and in an introductory book. 


In this chapter, we briefly consider the application of wavelet systems from two 
perspectives. First, we look at wavelets as a tool for denoising and compressing 
a wide variety of signals. Second, we very briefly list several problems where 
the application of these tools shows promise or has already achieved significant 
success. References will be given to guide the reader to the details of these 
applications, which are beyond the scope of this book. 


Wavelet-Based Signal Processing 


To accomplish frequency domain signal processing, one can take the Fourier 
transform (or Fourier series or DFT) of a signal, multiply some of the Fourier 
coefficients by zero (or some other constant), then take the inverse Fourier 
transform. It is possible to completely remove certain components of a signal 
while leaving others completely unchanged. The same can be done by using 
wavelet transforms to achieve wavelet-based, wavelet domain signal processing, 
or filtering. Indeed, it is sometimes possible to remove or separate parts of a 
signal that overlap in both time and frequency using wavelets, something 
impossible to do with conventional Fourier-based techniques. 
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DWT 
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Linear Linear 


Transform-Based Signal Processor 


The classical paradigm for transform-based signal processing is illustrated in 
[link] where the center “box" could be either a linear or nonlinear operation. 
The “dynamics" of the processing are all contained in the transform and inverse 
transform operation, which are linear. The transform-domain processing 
operation has no dynamics; it is an algebraic operation. By dynamics, we mean 
that a process depends on the present and past, and by algebraic, we mean it 
depends only on the present. An FIR (finite impulse response) filter such as is 
part of a filter bank is dynamic. Each output depends on the current and a finite 
number of past inputs (see [link]). The process of operating point-wise on the 
DWT of a signal is static or algebraic. It does not depend on the past (or future) 
values, only the present. This structure, which separates the linear, dynamic 
parts from the nonlinear static parts of the processing, allows practical and 
theoretical results that are impossible or very difficult using a completely 
general nonlinear dynamic system. 


Linear wavelet-based signal processing consists of the processor block in [link] 
multiplying the DWT of the signal by some set of constants (perhaps by zero). 
If undesired signals or noise can be separated from the desired signal in the 
wavelet transform domain, they can be removed by multiplying their 
coefficients by zero. This allows a more powerful and flexible processing or 
filtering than can be achieved using Fourier transforms. The result of this total 
process is a linear, time-varying processing that is far more versatile than linear, 
time-invariant processing. The next section gives an example of using the 
concentrating properties of the DWT to allow a faster calculation of the FFT. 


Approximate FFT using the Discrete Wavelet Transform 


In this section, we give an example of wavelet domain signal processing. Rather 
than computing the DFT from the time domain signal using the FFT algorithm, 
we will first transform the signal into the wavelet domain, then calculate the 
FFT, and finally go back to the signal domain which is now the Fourier domain. 


Most methods of approximately calculating the discrete Fourier transform 
(DFT) involve calculating only a few output points (pruning), using a small 
number of bits to represent the various calculations, or approximating the 


kernel, perhaps by using cordic methods. Here we use the characteristics of the 
signal being transformed to reduce the amount of arithmetic. Since the wavelet 
transform concentrates the energy of many classes of signals onto a small 
number of wavelet coefficients, this can be used to improve the efficiency of the 
DFT [link], [link], [link], [link] and convolution [link]. 


Introduction 


The DFT is probably the most important computational tool in signal 
processing. Because of the characteristics of the basis functions, the DFT has 
enormous capacity for the improvement of its arithmetic efficiency [link]. The 
classical Cooley-Tukey fast Fourier transform (FFT) algorithm has the 
complexity of O(N log, N). Thus the Fourier transform and its fast algorithm, 
the FFT, are widely used in many areas, including signal processing and 
numerical analysis. Any scheme to speed up the FFT would be very desirable. 


Although the FFT has been studied extensively, there are still some desired 
properties that are not provided by the classical FFT. Here are some of the 
disadvantages of the FFT algorithm: 


1. Pruning is not easy. When the number of input points or output points are 
small compared to the length of the DWT, a special technique called 
pruning [link] is often used. However, this often requires that the nonzero 
input data are grouped together. Classical FFT pruning algorithms do not 
work well when the few nonzero inputs are randomly located. In other 
words, a sparse signal may not necessarily give rise to faster algorithm. 

2. No speed versus accuracy tradeoff. It is common to have a situation where 
some error would be allowed if there could be a significant increase in 
speed. However, this is not easy with the classical FFT algorithm. One of 
the main reasons is that the twiddle factors in the butterfly operations are 
unit magnitude complex numbers. So all parts of the FFT structure are of 
equal importance. It is hard to decide which part of the FFT structure to 
omit when error is allowed and the speed is crucial. In other words, the 
FFT is a single speed and single accuracy algorithm. 

3. No built-in noise reduction capacity. Many real world signals are noisy. 
What people are really interested in are the DFT of the signals without the 
noise. The classical FFT algorithm does not have built-in noise reduction 
capacity. Even if other denoising algorithms are used, the FFT requires the 


same computational complexity on the denoised signal. Due to the above 
mentioned shortcomings, the fact that the signal has been denoised cannot 
be easily used to speed up the FFT. 


Review of the Discrete Fourier Transform and FFT 


The discrete Fourier transform (DFT) is defined for a length-N complex data 
sequence by 
Equation: 


where we use j = 1/—1. There are several ways to derive the different fast 
Fourier transform (FFT) algorithms. It can be done by using index mapping 
[link], by matrix factorization, or by polynomial factorization. In this chapter, 
we only discuss the matrix factorization approach, and only discuss the so- 
called radix-2 decimation in time (DIT) variant of the FFT. 


Instead of repeating the derivation of the FFT algorithm, we show the block 
diagram and matrix factorization, in an effort to highlight the basic idea and 
gain some insight. The block diagram of the last stage of a length-8 radix-2 DIT 
FFT is shown in [link]. First, the input data are separated into even and odd 
groups. Then, each group goes through a length-4 DFT block. Finally, butterfly 
operations are used to combine the shorter DFTs into longer DFTs. 
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length-4 DFT 


Last Stage of a Length-8 Radix-2 DIT FFT 


The details of the butterfly operations are shown in [link], where 

Wi, = et /N jg called the twiddle factor. All the twiddle factors are of 
magnitude one on the unit circle. This is the main reason that there is no 
complexity versus accuracy tradeoff for the classical FFT. Suppose some of the 
twiddle factors had very small magnitude, then the corresponding branches of 
the butterfly operations could be dropped (pruned) to reduce complexity while 
minimizing the error to be introduced. Of course the error also depends on the 
value of the data to be multiplied with the twiddle factors. When the value of 
the data is unknown, the best way is to cutoff the branches with small twiddle 
factors. 


The computational complexity of the FFT algorithm can be easily established. If 
we let Crrr (NV) be the complexity for a length-N FFT, we can show 
Equation: 


Crrr (NV) = O(N) + 2C rrr (N/2), 


where O(JV) denotes linear complexity. The solution to Equation [link] is well 
known: 
Equation: 


Crrr (N) = O(N log, N). 


This is a classical case where the divide and conquer approach results in very 
effective solution. 


Wi T — 1 


Butterfly Operations 
in a Radix-2 DIT 
FFT 


The matrix point of view gives us additional insight. Let Fy be the N x N DFT 
matrix; i.e., Fy (m,n) = e-?7""/" where m,n € {0,1,..., N—1}. Let Sy 
be the NV x N even-odd separation matrix; e.g., 

Equation: 
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Clearly Svs n = In, where Iy is the N x N identity matrix. Then the DIT 
FFT is based on the following matrix factorization, 


Equation: 


I T F 0 
Fy = FySySy — N/2 N/2 | N/2 


0 Sy, 


Inj2 —Twp 


where T y/2 is a diagonal matrix with W?,i € {0,1,..., N/2—1} on the 
diagonal. We can visualize the above factorization as 
Equation: 


where we image the real part of DFT matrices, and the magnitude of the 
matrices for butterfly operations and even-odd separations. NV is taken to be 128 
here. 


Review of the Discrete Wavelet Transform 


In this section, we briefly review the fundamentals of the discrete wavelet 
transform and introduce the necessary notation for future sections. The details 
of the DWT have been covered in other chapters. 


At the heart of the discrete wavelet transform are a pair of filters h and g — 
lowpass and highpass respectively. They have to satisfy a set of constraints 
Figure: Sinc Scaling Function and Wavelet [link], [link], [link]. The block 
diagram of the DWT is shown in [link]. The input data are first filtered by h 
and g then downsampled. The same building block is further iterated on the 
lowpass outputs. 


Building Block for the 
Discrete Wavelet 
Transform 


The computational complexity of the DWT algorithm can also be easily 
established. Let Ccpwr (JV) be the complexity for a length-N DWT. Since after 
each scale, we only further operate on half of the output data, we can show 
Equation: 


Cpowr (N) = O(N) + Cowr (N/2), 
which gives rise to the solution 
Equation: 
Cowr (N) = O(N). 


The operation in [link] can also be expressed in matrix form W y; e.g., for Haar 
wavelet, 


Equation: 
1 -1 0 O 
0 1 -1 
witer — 3/2 | 
‘ (2), 0 0 
0 O11 1 


The orthogonality conditions on h and g ensure W'y W y = Iy. The matrix 
for multiscale DWT is formed by W y for different N; e.g., for three scale 


DWT, 
Equation: 
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We could further iterate the building block on some of the highpass outputs. 
This generalization is called the wavelet packets [link]. 


The Algorithm Development 


The key to the fast Fourier transform is the factorization of Fy into several 
Sparse matrices, and one of the sparse matrices represents two DFTs of half the 
length. In a manner similar to the DIT FFT, the following matrix factorization 
can be made: 
Equation: 

Fy =FyWiwy = Pie ia ea 0 


Ww 
0 a a 


Cy Dy 


where Ay /2, By/2, C2, and Dy are all diagonal matrices. The values on 
the diagonal of A y/2 and C y/z are the length-N DFT (i.e., frequency response) 
of h, and the values on the diagonal of By/z and Dy//2 are the length-N DFT 


of g. We can visualize the above factorization as 
Equation: 


where we image the real part of DFT matrices, and the magnitude of the 
matrices for butterfly operations and the one-scale DWT using length-16 
Daubechies' wavelets [link], [link]. Clearly we can see that the new twiddle 
factors have non-unit magnitudes. 
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length-4 DFT 


Last stage of a length-8 DWT based FFT. 


The above factorization suggests a DWT-based FFT algorithm. The block 
diagram of the last stage of a length-8 algorithm is shown in [link]. This scheme 
is iteratively applied to shorter length DFTs to get the full DWT based FFT 
algorithm. The final system is equivalent to a full binary tree wavelet packet 
transform [link] followed by classical FFT butterfly operations, where the new 
twiddle factors are the frequency response of the wavelet filters. 


The detail of the butterfly operation is shown in [link], where i € {0,1,..., 
N/2—1}. Now the twiddle factors are length-N DFT of h and g. For well 
defined wavelet filters, they have well known properties; e.g., for Daubechies' 
family of wavelets, their frequency responses are monotone, and nearly half of 
which have magnitude close to zero. This fact can be exploited to achieve speed 


vs. accuracy tradeoff. The classical radix-2 DIT FFT is a special case of the 
above algorithm when h = {1, 0] and g = [0, 1]. Although they do not satisfy 
some of the conditions required for wavelets, they do constitute a legitimate 
(and trivial) orthogonal filter bank and are often called the lazy wavelets in the 
context of lifting. 


Awnya(i, i) 
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Butterfly Operations in a 
Radix-2 DIT FFT 


Computational Complexity 


For the DWT-based FFT algorithm, the computational complexity is on the 
same order of the FFT — O(N log, N), since the recursive relation in [link] is 
again satisfied. However, the constant appearing before N log, N depends on 
the wavelet filters used. 


Fast Approximate Fourier Transform 


The basic idea of the fast approximate Fourier transform (FAFT) is pruning; i.e., 
cutting off part of the diagram. Traditionally, when only part of the inputs are 
nonzero, or only part of the outputs are required, the part of the FFT diagram 
where either the inputs are zero or the outputs are undesired is pruned [link], so 


that the computational complexity is reduced. However, the classical pruning 
algorithm is quite restrictive, since for a majority of the applications, both the 
inputs and the outputs are of full length. 


The structure of the DWT-based FFT algorithm can be exploited to generalize 
the classical pruning idea for arbitrary signals. From the input data side, the 
signals are made sparse by the wavelet transform [link], [link], [link], [link]; 
thus approximation can be made to speed up the algorithm by dropping the 
insignificant data. In other words, although the input signal are normally not 
sparse, DWT creates the sparse inputs for the butterfly stages of the FFT. So any 
scheme to prune the butterfly stages for the classical FFT can be used here. Of 
course, the price we have to pay here is the computational complexity of the 
DWT operations. In actual implementation, the wavelets in use have to be 
carefully chosen to balance the benefit of the pruning and the price of the 
transform. Clearly, the optimal choice depends on the class of the data we would 
encounter. 


From the transform side, since the twiddle factors of the new algorithm have 
decreasing magnitudes, approximation can be made to speed up the algorithm 
by pruning the sections of the algorithm which correspond to the insignificant 
twiddle factors. The frequency response of the Daubechies' wavelets are shown 
in [link]. We can see that they are monotone decreasing. As the length increases, 
more and more points are close to zero. It should be noted that those filters are 
not designed for frequency responses. They are designed for flatness at 0 and 7. 
Various methods can be used to design wavelets or orthogonal filter banks 
[link], [link], [link] to achieve better frequency responses. Again, there is a 
tradeoff between the good frequency response of the longer filters and the 
higher complexity required by the longer filters. 


frequency response of Daubechies family of wavelets 


— — length-4 

-— - length-8 

— length-16 
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The Frequency Responses of Daubechies' Family of Wavelets 


Computational Complexity 


The wavelet coefficients are mostly sparse, so the input of the shorter DFTs are 
sparse. If the implementation scales well with respect to the percentage of the 
significant input (e.g., it uses half of the time if only half of the inputs are 
significant), then we can further lower the complexity. Assume for NV inputs, 
aN of them are significant (a < 1), we have 

Equation: 


CRAFT (NV) — O(N) + 2aC parr (N /2). 


For example, if a = + Equation [link] simplifies to 
Equation: 


Crart (N) = O(N) + Crarr (N/2), 


which leads to 
Equation: 


Crarr(N) = O(N). 


So under the above conditions, we have a linear complexity approximate FFT. 
Of course, the complexity depends on the input data, the wavelets we use, the 
threshold value used to drop insignificant data, and the threshold value used to 
prune the butterfly operations. It remains to find a good tradeoff. Also the 
implementation would be more complicated than the classical FFT. 


Noise Reduction Capacity 


It has been shown that the thresholding of wavelet coefficients has near optimal 
noise reduction property for many classes of signals [link]. The thresholding 
scheme used in the approximation in the proposed FAFT algorithm is exactly 
the hard thresholding scheme used to denoise the data. Soft thresholding can 
also be easily embedded in the FAFT. Thus the proposed algorithm also reduces 
the noise while doing approximation. If we need to compute the DFT of noisy 
signals, the proposed algorithm not only can reduce the numerical complexity 
but also can produce cleaner results. 


Summary 


In the past, the FFT has been used to calculate the DWT [link], [link], [link], 
which leads to an efficient algorithm when filters are infinite impulse response 
(IIR). In this chapter, we did just the opposite — using DWT to calculate FFT. 


We have shown that when no intermediate coefficients are dropped and no 
approximations are made, the proposed algorithm computes the exact result, and 
its computational complexity is on the same order of the FFT; i.e., 

O(N log, N). The advantage of our algorithm is two fold. From the input data 
side, the signals are made sparse by the wavelet transform, thus approximation 
can be made to speed up the algorithm by dropping the insignificant data. From 
the transform side, since the twiddle factors of the new algorithm have 
decreasing magnitudes, approximation can be made to speed up the algorithm 
by pruning the section of the algorithm which corresponds to the insignificant 
twiddle factors. Since wavelets are an unconditional basis for many classes of 
signals [link], [link], [link], the algorithm is very efficient and has built-in 
denoising capacity. An alternative approach has been developed by Shentov, 
Mitra, Heute, and Hossen [link], [link] using subband filter banks. 


Nonlinear Filtering or Denoising with the DWT 


Wavelets became known to most engineers and scientists with the publication of 
Daubechies' important paper [link] in 1988. Indeed, the work of Daubechies 
[link], Mallat [link], [link], [link], Meyer [link], [link], and others produced 
beautiful and interesting structures, but many engineers and applied scientist felt 
they had a “solution looking for a problem." With the recent work of Donoho 
and Johnstone together with ideas from Coifman, Beylkin and others, the field 
is moving into a second phase with a better understanding of why wavelets 
work. This new understanding combined with nonlinear processing not only 
solves currently important problems, but gives the potential of formulating and 
solving completely new problems. We now have a coherence of approach and a 
theoretical basis for the success of our methods that should be extraordinarily 
productive over the next several years. Some of the Donoho and Johnstone 
references are [link], [link], [link], [link], [link], [link], [link], [link], [link], 
[link], [link], [link] and related ones are [link], [link], [link], [link], [link]. Ideas 
from Coifman are in [link], [link], [link], [link], [link], [link], [link]. 


These methods are based on taking the discrete wavelet transform (DWT) of a 
signal, passing this transform through a threshold, which removes the 
coefficients below a certain value, then taking the inverse DWT, as illustrated in 
[link]. They are able to remove noise and achieve high compression ratios 
because of the “concentrating” ability of the wavelet transform. If a signal has 
its energy concentrated in a small number of wavelet dimensions, its 
coefficients will be relatively large compared to any other signal or noise that 


has its energy spread over a large number of coefficients. This means that 
thresholding or shrinking the wavelet transform will remove the low amplitude 
noise or undesired signal in the wavelet domain, and an inverse wavelet 
transform will then retrieve the desired signal with little loss of detail. In 
traditional Fourier-based signal processing, we arrange our signals such that the 
signals and any noise overlap as little as possible in the frequency domain and 
linear time-invariant filtering will approximately separate them. Where their 
Fourier spectra overlap, they cannot be separated. Using linear wavelet or other 
time-frequency or time-scale methods, one can try to choose basis systems such 
that in that coordinate system, the signals overlap as little as possible, and 
separation is possible. 


The new nonlinear method is entirely different. The spectra can overlap as much 
as they want. The idea is to have the amplitude, rather than the location of the 
spectra be as different as possible. This allows clipping, thresholding, and 
shrinking of the amplitude of the transform to separate signals or remove noise. 
It is the localizing or concentrating properties of the wavelet transform that 
makes it particularly effective when used with these nonlinear methods. Usually 
the same properties that make a system good for denoising or separation by 
nonlinear methods, makes it good for compression, which is also a nonlinear 
process. 


Denoising by Thresholding 


We develop the basic ideas of thresholding the wavelet transform using 
Donoho's formulations [link], [link], [link]. Assume a finite length signal with 
additive noise of the form 

Equation: 


Y= aj,+en, t=1,...,N 


as a finite length signal of observations of the signal x; that is corrupted by 
iid. zero mean, white Gaussian noise n; with standard deviation €, i.e., 

iid ; ; 
ny ~ NV (0, 1). The goal is to recover the signal x from the noisy observations 
y. Here and in the following, v denotes a vector with the ordered elements v; if 


the index 7 is omitted. Let W be a left invertible wavelet transformation matrix 


of the discrete wavelet transform (DWT). Then Eq. [link] can be written in the 
transformation domain 
Equation: 


Y=X+N, or, Y;=Xi+M, 


where capital letters denote variables in the transform domain, i.e. Y = Wy. 
Then the inverse transform matrix W~! exists, and we have 
Equation: 


Wi ¢Wel. 
The following presentation follows Donoho's approach [link], [link], [link], 


[link], [link] that assumes an orthogonal wavelet transform with a square W; 
i.e, W-! = W?. We will use the same assumption throughout this section. 


Let X denote an estimate of X , based on the observations Y. We consider 
diagonal linear projections 
Equation: 


A = diag (61, ..., dn), 6; € {0, 1}, t= lywsdV, 
which give rise to the estimate 
Equation: 
2=W'X=W AY =W !Awy. 
The estimate X is obtained by simply keeping or zeroing the individual wavelet 


coefficients. Since we are interested in the ly error we define the risk measure 
Equation: 


R(X,X) = B||| a [3] = Bll] wot (X — x) |B] = Bl) Xx 13). 


Notice that the last equality in Eq. [link] is a consequence of the orthogonality 
of W. The optimal coefficients in the diagonal projection scheme are 


6; = 1x,s.;[ footnote] i.e., only those values of Y where the corresponding 
elements of X are larger than € are kept, all others are set to zero. This leads to 
the ideal risk 

It is interesting to note that allowing arbitrary 6; € IR improves the ideal risk by 
at most a factor of 2[link] 

Equation: 


The ideal risk cannot be attained in practice, since it requires knowledge of X, 
the wavelet transform of the unknown vector 2. However, it does give us a 
lower limit for the J, error. 


Donoho proposes the following scheme for denoising: 


1. compute the DWT Y = Wy 

2. perform thresholding in the wavelet domain, according to so-called hard 
thresholding 
Equation: 


X =T,(Y,t) = te 
SR TT 10, YI 4 

or according to so-called soft thresholding 

Equation: 


: sen(¥)(|¥|—t), [Y|>t 
R= Ts (¥,1)={ 0 IY] <¢ 


3. compute the inverse DWT 2 = W-1X 


This simple scheme has several interesting properties. It's risk is within a 
logarithmic factor (log NV) of the ideal risk for both thresholding schemes and 
properly chosen thresholds ¢(NV, €). If one employs soft thresholding, then the 
estimate is with high probability at least as smooth as the original function. The 
proof of this proposition relies on the fact that wavelets are unconditional bases 
for a variety of smoothness classes and that soft thresholding guarantees (with 


holds. The shrinkage 


condition guarantees that % is in the same smoothness class as is x. Moreover, 
the soft threshold estimate is the optimal estimate that satisfies the shrinkage 
condition. The smoothness property guarantees an estimate free from spurious 
oscillations which may result from hard thresholding or Fourier methods. Also, 
it can be shown that it is not possible to come closer to the ideal risk than within 
a factor log N. Not only does Donoho's method have nice theoretical properties, 
but it also works very well in practice. 


high probability) that the shrinkage condition x i|<|X; 


Some comments have to be made at this point. Similar to traditional approaches 
(e.g., low pass filtering), there is a trade-off between suppression of noise and 
oversmoothing of image details, although to a smaller extent. Also, hard 
thresholding yields better results in terms of the [2 error. That is not surprising 
since the observation value y; itself is clearly a better estimate for the real value 
x, than a shrunk value in a zero mean noise scenario. However, the estimated 
function obtained from hard thresholding typically exhibits undesired, spurious 
oscillations and does not have the desired smoothness properties. 


Shift-Invariant or Nondecimated Discrete Wavelet Transform 


As is well known, the discrete wavelet transform is not shift invariant; i.e., there 
is no “simple” relationship between the wavelet coefficients of the original and 
the shifted signal[footnote]. In this section we will develop a shift-invariant 
DWT using ideas of a nondecimated filter bank or a redundant DWT [link], 
[link], [link]. Because this system is redundant, it is not a basis but will be a 
frame or tight frame (see Section: Overcomplete Representations, Frames, 
Redundant Transforms, and Adaptive Bases ). Let X = Wz be the (orthogonal) 
DWT of « and Sz be a matrix performing a circular right shift by R with 
Re€Z. Then 

Since we deal with finite length signals, we really mean circular shift. 
Equation: 


X,=Wez, =WSpxe =WSRW 'X, 


which establishes the connection between the wavelet transforms of two shifted 
versions of a signal, x and x,, by the orthogonal matrix WS pW ~!. Asan 


illustrative example, consider [link]. 


DWT of skyline SWT of skyline circular shifted by 1 


0.5 


0 200 400 600 800 1000 0 200. 400. 600 800 +1000 


Shift Variance of the Wavelet Transform 


The first and most obvious way of computing a shift invariant discrete wavelet 
transform (SIDWT) is simply computing the wavelet transform of all shifts. 
Usually the two band wavelet transform is computed as follows: 1) filter the 
input signal by a low-pass and a high-pass filter, respectively, 2) downsample 
each filter output, and 3) iterate the low-pass output. Because of the 
downsampling, the number of output values at each stage of the filter bank 
(corresponding to coarser and coarser scales of the DWT) is equal to the 
number of the input values. Precisely N values have to be stored. The 
computational complexity is O(V). Directly computing the wavelet transform 
of all shifts therefore requires the storage of N? elements and has 
computational complexity O(N”). 


Beylkin [link], Shensa [link], and the Rice group[ footnote] independently 
realized that 1) there are only N log N different coefficient values among those 
corresponding to all shifts of the input signal and 2) those can be computed with 


computational complexity N log N. This can be easily seen by considering one 
stage of the filter bank. Let 

Those are the ones we are aware of. 

Equation: 


T 
y= [yoy y2--- yn] =he 


where y is the output of either the high-pass or the low-pass filter in the analysis 
filter bank, x the input and the matrix h describes the filtering operation. 
Downsampling of y by a factor of two means keeping the even indexed 
elements and discarding the odd ones. Consider the case of an input signal 
shifted by one. Then the output signal is shifted by one as well, and sampling 
with the same operator as before corresponds to keeping the odd-indexed 
coefficients as opposed to the even ones. Thus, the set of data points to be 
further processed is completely different. However, for a shift of the input signal 
by two, the downsampled output signal differs from the output of the nonshifted 
input only by a shift of one. This is easily generalized for any odd and even shift 
and we see that the set of wavelet coefficients of the first stage of the filter bank 
for arbitrary shifts consists of only 2N different values. Considering the fact 
that only the low-pass component (V values) is iterated, one recognizes that 
after L stages exactly ZN values result. Using the same arguments as in the 
shift variant case, one can prove that the computational complexity is 

O(N log N). The derivation for the synthesis is analogous. 


Mallat proposes a scheme for computing an approximation of the continuous 
wavelet transform [link] that turns out to be equivalent to the method described 
above. This has been realized and proved by Shensa [link]. Moreover, Shensa 
shows that Mallat's algorithm exhibits the same structure as the so-called 
algorithm a trous. Interestingly, Mallat's intention in [link] was not in particular 
to overcome the shift variance of the DWT but to get an approximation of the 
continuous wavelet transform. 


In the following, we shall refer to the algorithm for computing the SIDWT as 
the Beylkin algorithm[footnote] since this is the one we have implemented. 
Alternative algorithms for computing a shift-invariant wavelet transform [link] 
are based on the scheme presented in [link]. They explicitly or implicitly try to 
find an optimal, signal-dependent shift of the input signal. Thus, the transform 
becomes shift-invariant and orthogonal but signal dependent and, therefore, 
nonlinear. We mention that the generalization of the Beylkin algorithm to the 


multidimensional case, to an M/-band multiresolution analysis, and to wavelet 
packets is straightforward. 
However, it should be noted that Mallat published his algorithm earlier. 


Combining the Shensa-Beylkin-Mallat-a trous Algorithms and Wavelet 
Denoising 


It was Coifman who suggested that the application of Donoho's method to 
several shifts of the observation combined with averaging yields a considerable 
improvement.[footnote] This statement first lead us to the following algorithm: 
1) apply Donoho's method not only to “some” but to all circular shifts of the 
input signal 2) average the adjusted output signals. As has been shown in the 
previous section, the computation of all possible shifts can be effectively done 
using Beylkin's algorithm. Thus, instead of using the algorithm just described, 
one simply applies thresholding to the SIDWT of the observation and computes 
the inverse transform. 

A similar remark can be found in [link], p. 53. 


Before going into details, we want to briefly discuss the differences between 
using the traditional orthogonal and the shift-invariant wavelet transform. 
Obviously, by using more than NV wavelet coefficients, we introduce 
redundancy. Several authors stated that redundant wavelet transforms, or 
frames, add to the numerical robustness [link] in case of adding white noise in 
the transform domain; e.g., by quantization. This is, however, different from the 
scenario we are interested in, since 1) we have correlated noise due to the 
redundancy, and 2) we try to remove noise in the transform domain rather than 
considering the effect of adding some noise [link], [link]. 


Performance Analysis 


The analysis of the ideal risk for the SIDWT is similar to that by Guo [link]. 
Define the sets A and B according to 
Equation: 


A = {il |Xil > ef 
B = {i||X;| <e} 


and an ideal diagonal projection estimator, or oracle, 
Equation: 


Zz Y,=X;+N; icA 
~~ 10 ic B. 


The pointwise estimation error is then 
Equation: 


‘ | =X; eB. 


In the following, a vector or matrix indexed by A (or B) indicates that only 
those rows are kept that have indices out of A (or B). All others are set to zero. 
With these definitions and [link], the ideal risk for the SIDWT can be derived 
Equation: 


Ria (¥,X) = El||w (¥-x) 13) 


= E|||W (Na - Xa) |[5| 


= E|(N,4—Xps)'W me 1(N, — Xz) 

= E|NIW“™W "Nal — 2X§CwaLk[Na] + X$CwaXe 
tr[E|W'NaN{W—'")] + X$CwaXe 

= tr [W-E|Waenen*Wi |W) + X$CwaXes 

= etr[W 'WaWi{w-'"| + XfCwaXe. 


where tr(X) denotes the trace of X. For the derivation we have used, the fact 
that V4 = eW,n and consequently the N.4; have zero mean. Notice that for 
orthogonal W the Eq. [link] immediately specializes to Eq. [link]. Eq. [link] 
depends on the particular signal X g, the transform, W ~', and the noise level e. 


It can be shown that when using the SIDWT introduced above and the 
thresholding scheme proposed by Donoho (including his choice of the 
threshold) then there exists the same upper bound for the actual risk as for case 
of the orthogonal DWT. That is the ideal risk times a logarithmic (in NV) factor. 
We give only an outline of the proof. Johnstone and Silverman state [link] that 
for colored noise an oracle chooses 6; = 1x,><¢,, where e€; is the standard 
deviation of the zth component. Since Donoho's method applies uniform 
thresholding to all components, one has to show that the diagonal elements of 
Cy -: (the variances of the components of NV) are identical. This can be shown 
by considering the reconstruction scheme of the SIDWT. With these statements, 
the rest of the proof can be carried out in the same way as the one given by 
Donoho and Johnstone [link]. 


Examples of Denoising 


The two examples illustrated in [link] show how wavelet based denoising 
works. The first shows a chirp or doppler signal which has a changing 
frequency and amplitude. Noise is added to this chirp in (b) and the result of 
basic Donoho denoising is shown in (c) and of redundant DWT denoising in (d). 
First, notice how well the noise is removed and at almost no sacrifice in the 
signal. This would be impossible with traditional linear filters. 


The second example is the Houston skyline where the improvement of the 
redundant DWT is more obvious. 
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(b) Noisy Doppler 
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(£) Noisy Houston Skyline 
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(g) DWT Denoised Skyline (h) RDWT Denoised Skyline 


Example of Noise Reduction using w pg’ 


Statistical Estimation 


This problem is very similar to the signal recovery problem; a signal has to be 
estimated from additive white Gaussian noise. By linearity, additive noise is 
additive in the transform domain where the problem becomes: estimate @ from 
y = 0+ ez, where z is a noise vector (with each component being a zero mean 
variance one Gaussian random variable) and € > 0 is a scalar noise level. The 
performance measured by the mean squared error (by Parseval) is given by 
Equation: 


R. (6,6) = BI}6(y) — > 


It depends on the signal (0), the estimator 6, the noise level e€, and the basis. 


For a fixed e, the optimal minmax procedure is the one that minimizes the error 
for the worst possible signal from the coefficient body 0. 
Equation: 

R, (©) =infsup R,. (6, 0). 


€ a 


6 0€O 


Consider the particular nonlinear procedure @ that corresponds to soft- 
thresholding of every noisy coefficient y;: 
Equation: 


T. (x1) = sgn (yi)(lyil — €)4- 


Let r, (9) be the corresponding error for signal @ and let 2° (Q) be the worst- 
case error for the coefficient body O. 


If the coefficient body is solid, orthosymmetric in a particular basis, then 
asymptotically (€ —> 0) the error decays at least as fast in this basis as in any 
other basis. That is r, (©) approaches zero at least as fast as r- (UO) for any 
orthogonal matrix U. Therefore, unconditional bases are nearly optimal 


asymptotically. Moreover, for small € we can relate this procedure to any other 
procedure as follows [link]: 
Equation: 


* 


R’ (€,@) <r (€,®) < O(log (1/e))- R’ (e, 8), e—> 0. 


Signal and Image Compression 


Fundamentals of Data Compression 


From basic information theory, we know the minimum average number of bits 
needed to represent realizations of a independent and identically distributed 
discrete random variable X is its entropyH(X) [link]. If the distribution p(X) is 
known, we can design Huffman codes or use the arithmetic coding method to 
achieve this minimum [link]. Otherwise we need to use adaptive method [link]. 


Continuous random variables require an infinite number of bits to represent, so 
quantization is always necessary for practical finite representation. However, 
quantization introduces error. Thus the goal is to achieve the best rate-distortion 
tradeoff [link], [link], [link]. Text compression [link], waveform coding [link] 
and subband coding [link] have been studied extensively over the years. Here 
we concentrate on wavelet compression, or more general, transform coding. 
Also we concentrate on low bitrate. 


Prototype Transform Coder 


Prototype Transform Coder 


The simple three-step structure of a prototype transform coder is shown in 
[link]. The first step is the transform of the signal. For a length-N discrete 
signal f(m), we expand it using a set of orthonormal basis functions as 
Equation: 


N 


f (n) = Sei (n), 


1 


where 
Equation: 


ci = (f (n), fi (n)). 


We then use the uniform scalar quantizer Q as in [link], which is widely used 
for wavelet based image compression [link], [link], 
Equation: 


Cj = Q (c;). 
Denote the quantization step size as 7’. Notice in the figure that the quantizer 


has a dead zone, so if |c;|< T, then Q(c;) = 0. We define an index set for those 
insignificant coefficients 


Uniform Scalar Quantizer 


JF = {i: |c;| < T}. Let M be the number of coefficients with magnitudes 
greater than T (significant coefficients). Thus the size of 4 is N — M. The 
squared error caused by the quantization is 

Equation: 


Since the transform is orthonormal, it is the same as the reconstruction error. 
Assume T is small enough, so that the significant coefficients are uniformly 
distributed within each quantization bins. Then the second term in the error 
expression is 

Equation: 


For the first term, we need the following standard approximation theorem [link] 
that relates it to the J,, norm of the coefficients, 
Equation: 


1/p 


N 
If llp = | Slee? 
w=1 


Theorem 56 Let \ = - > ~ then 
Equation: 


2 
ee z | f ll M}-2 
— 2N—1 


This theorem can be generalized to infinite dimensional space if || f F = = -Oe, 


It has been shown that for functions in a Besov space, || f iF < +00 does not 


depend on the particular choice of the wavelet as long as each wavelet in the 
basis has g > A — > vanishing moments and is q times continuously 
differentiable [link]. The Besov space includes piece-wise regular functions that 
may include discontinuities. This theorem indicates that the first term of the 
error expression decreases very fast when the number of significant coefficient 
increases. 


The bit rate of the prototype compression algorithm can also be separated in two 
parts. For the first part, we need to indicate whether the coefficient is 
significant, also known as the significant map. For example, we could use 1 for 
significant, and 0 for insignificant. We need a total of NV these indicators. For 
the second part, we need to represent the values of the significant coefficients. 
We only need MM values. Because the distribution of the values and the 
indicators are not known in general, adaptive entropy coding is often used 
[link]. 


Energy concentration is one of the most important properties for low bitrate 
transform coding. Suppose for the sample quantization step size T’, we have a 
second set of basis that generate less significant coefficients. The distribution of 


the significant map indicators is more skewed, thus require less bits to code. 
Also, we need to code less number of significant values, thus it may require less 
bits. In the mean time, a smaller JV reduces the second error term as in [link]. 
Overall, it is very likely that the new basis improves the rate-distortion 
performance. Wavelets have better energy concentration property than the 
Fourier transform for signals with discontinuities. This is one of the main 
reasons that wavelet based compression methods usually out perform DCT 
based JPEG, especially at low bitrate. 


Improved Wavelet Based Compression Algorithms 


The above prototype algorithm works well [link], [link], but can be further 
improved for its various building blocks [link]. As we can see from [link], the 
significant map still has considerable structure, which could be exploited. 
Modifications and improvements use the following ideas: 


e Insignificant coefficients are often clustered together. Especially, they often 
cluster around the same location across several scales. Since the distance 
between nearby coefficients doubles for every scale, the insignificant 
coefficients often form a tree shape, as we can see from Figure: Discrete 
Wavelet Transform of the Houston Skyline, using ~pg_with a Gain of J/2 
for Each Higher Scale . These so called zero-trees can be exploited [link], 
[link] to achieve excellent results. 

e The choice of basis is very important. Methods have been developed to 
adaptively choose the basis for the signal [link], [link]. Although they 
could be computationally very intensive, substantial improvement can be 
realized. 

e Special run-length codes could be used to code significant map and values 
[link], [link]. 
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The Significant Map for the Lenna image. 


Advanced quantization methods could be used to replace the simple scalar 
quantizer [link]. 

Method based on statistical analysis like classification, modeling, 
estimation, and prediction also produces impressive result [link]. 

Instead of using one fixed quantization step size, we can successively 
refine the quantization by using smaller and smaller step sizes. These 
embedded schemes allow both the encoder and the decoder to stop at any 
bit rate [link], [link]. 

The wavelet transform could be replaced by an integer-to-integer wavelet 
transform, no quantization is necessary, and the compression is lossless 
[ink]. 


Other references are:[link], [link], [link], [link], [link], [link], [link], [link], 
[link], [link], [link], [link], [link], [Link]. 


Why are Wavelets so Useful? 


The basic wavelet in wavelet analysis can be chosen so that it is smooth, where 
smoothness is measured in a variety of ways [link]. To represent f(t) with K 
derivatives, one can choose a wavelet 7(t) that is K (or more) times 
continuously differentiable; the penalty for imposing greater smoothness in this 
sense is that the supports of the basis functions, the filter lengths and hence the 
computational complexity all increase. Besides, smooth wavelet bases are also 
the “best bases” for representing signals with arbitrarily many singularities 
[link], a remarkable property. 


The usefulness of wavelets in representing functions in these and several other 
classes stems from the fact that for most of these spaces the wavelet basis is an 
unconditional basis, which is a near-optimal property. 


To complete this discussion, we have to motivate the property of an 
unconditional basis being asymptotically optimal for a particular problem, say 
data compression [link]. [link] suggests why a basis in which the coefficients 
are solid and orthosymmetric may be desired. The signal class is defined to be 
the interior of the rectangle bounded by the lines x = +a and y = +b. The 
signal corresponding to point A is the worst-case signal for the two bases shown 
in the figure; the residual error (with n = 1) is given by a sin (0) + b cos (6) 
for 8 € {0, a} and is minimized by 0 = 0, showing that the orthosymmetric 
basis is preferred. This result is really a consequence of the fact that a 4 b 
(which is typically the case why one uses transform coding—if a = 6, it turns 
out that the “diagonal” basis with @ = 7 is optimal for n = 1). The closer the 
coefficient body is to a solid, orthosymmetric body with varying side lengths, 
the less the individual coefficients are correlated with each other and the greater 
the compression in this basis. 


In summary, the wavelet bases have a number of useful properties: 


1. They can represent smooth functions. 

2. They can represent singularities 

3. The basis functions are local. This makes most coefficient-based 
algorithms naturally adaptive to inhomogeneities in the function. 


4. They have the unconditional basis (or near optimal in a minimax sense) 
property for a variety of function classes implying that if one knows very 
little about a signal, the wavelet basis is usually a reasonable choice. 


A 


Optimal Basis for Data Compression 


5. They are computationally inexpensive—perhaps one of the few really 
useful linear transform with a complexity that is O(V)—as compared to a 
Fourier transform, which is N log (JV) or an arbitrary linear transform 
which is O(N”). 

6. Nonlinear soft-thresholding is near optimal for statistical estimation. 

7. Nonlinear soft-thresholding is near optimal for signal recovery. 

8. Coefficient vector truncation is near optimal for data compression. 


Applications 


Listed below are several application areas in which wavelet methods have had 
some success. 


Numerical Solutions to Partial Differential Equations 


The use of wavelets as basis functions for the discretization of PDEs has had 
excellent success. They seem to give a generalization of finite element methods 
with some characteristics of multigrid methods. It seems to be the localizing 
ability of wavelet expansions that give rise to sparse operators and good 
numerical stability of the methods [link], [link], [link], [link], [link], [link], 
[link], [link], [link], [link]. 


Seismic and Geophysical Signal Processing 


One of the exciting applications areas of wavelet-based signal processing is in 
seismic and geophysical signal processing. Applications of denoising, 
compression, and detection are all important here, especially with higher- 
dimensional signals and images. Some of the references can be found in [Link], 
[link], [link], [link], [link], [link], [Link], [link][link], [link], [link], [link]. 


Medical and Biomedical Signal and Image Processing 


Another exciting application of wavelet-based signal processing is in medical 
and biomedical signal and image processing. Again, applications of denoising, 
compression, and detection are all important here, especially with higher 
dimensional signals and images. Some of the references can be found in [Link], 
[ink], [Link]. 


Application in Communications 


Some applications of wavelet methods to communications problems are in 
[ink], [link], [link], [link], [Link]. 


Fractals 


Wavelet-based signal processing has been combined with fractals and to 
systems that are chaotic [link], [link], [link], [link], [Link], [link], [link], [ink]. 
The multiresolution formulation of the wavelet and the self-similar 


characteristic of certain fractals make the wavelet a natural tool for this analysis. 
An application to noise removal from music is in [link]. 


Other applications are to the automatic target recognition (ATR) problem, and 
many other questions. 


Wavelet Software 


There are several software packages available to study, experiment with, and 
apply wavelet signal analysis. There are several Matlab programs at the end of 
this book. MathWorks, Inc. has a Wavelet Toolbox [link]; Donoho's group at 
Stanford has WaveTool; the Yale group has XWPL and WPLab [link]; Taswell 
at Stanford has WavBox [link], a group in Spain has Uvi-Wave; MathSoft, Inc. 
has S+WAVELETS; Aware, Inc. has WaveTool; and the DSP group at Rice has 
a Matlab wavelet toolbox available over the internet at http://www-dsp.rice.edu. 
There is a good description and list of several wavelet software packages in 
[link]. There are several Matlab programs in Appendix C of this book. They 
were used to create the various examples and figures in this book and should be 
studied when studying the theory of a particular topic. 


Summary Overview 


Properties of the Basic Multiresolution Scaling Function 


The first summary is given in four tables of the basic relationships and equations, primarily developed in Chapter: 
The Scaling Function and Scaling Coefficients, Wavelet and Wavelet Coefficients , for the scaling function y(t), 
scaling coefficients h(n), and their Fourier transforms ®(w) and H(w) for the multiplier M = 2 or two-band 
multiresolution system. The various assumptions and conditions are omitted in order to see the “big picture" and 
to see the effects of increasing constraints. 


Case Condition p(t) ®(w) ae 

1 Multiresolution p(t)= ih (n)V 2p (2t — n) $(w) =|] wi (+) distribution 

2 Partition of 1 YS v(t-—n)=1 ®(2rk) = 6(k) distribution 

3 Orthogonal Se(t) e(t — k) dt = 6(k) SI |G(Wwt27k)PP=1 LP? 

5 SF Smoothness ee <0o poly € ¥; 

6 SF Moments ftp (t) dt =0 Coiflets 

Properties of MM = 2 Scaling Functions (SF) and their Fourier Transforms 

Case Condition h(n) H(w) a 

1 Existence Sih(n) = v2 H(0)=v2 

2 Fundamental Yo h(2n) = YF A(2n + 1) H(n) =0 EV =1 

3 QMF h(n) a(n — 2k) = 6(k) | (w)P+|H(wtm)?=2 EV<1 

4 Orthogonal Dih(n) h(n — 2k) = 4(k) [A (w)?+|HWwtmP=2 Fy y 
L’ Basis and H(w) 4 0, |w| < 1/3 others < 1 

6 Coiflets So nth (n) =0 


Properties of M = 2 Scaling Coefficients and their Fourier Transforms 


Case Condition w(t) U(w) Signal Space 


1 MRA w(t) = hi (n)V2p(2t—-n) = ¥() = Tei (Se) distribution 
3 Orthogonal J o(t) v(t —k) dt =0 i 
3 Orthogonal f v(t) v(t — k) dt = 5(k) i? 
W k a poly 
: Moments feey (ide not € W; 


Properties of MZ = 2 Wavelets (W) and their Fourier Transforms 


Case Condition hi (n) Hy (w) aoe 

2 Fundamental Yo hi (n) =0 H,(0) =0 

3 Orthogonal hi(n) = (—1)"h(1 —n) |H; (w)| = |H (w+7)| 

3 Orthogonal Shi (n)hi (2m—n)=5(m) | Hi(w)|? + |H (w)|? = 2 

5 Smoothness So n*hy (n) =0 H (w) = (w—2)* H (w) loa 


Properties of MM = 2 Wavelet Coefficients and their Fourier Transforms 


The different “cases" represent somewhat similar conditions for the stated relationships. For example, in Case 1, 
Table 1, the multiresolution conditions are stated in the time and frequency domains while in Table 2 the 
corresponding necessary conditions on h(n) are given for a scaling function in L. However, the conditions are 
not sufficient unless general distributions are allowed. In Case 1, Table 3, the definition of a wavelet is given to 
span the appropriate multiresolution signal space but nothing seems appropriate for Case 1 in Table 4. Clearly the 
organization of these tables are somewhat subjective. 


If we “tighten" the restrictions by adding one more linear condition, we get Case 2 which has consequences in 
Tables 1, 2, and 4 but does not guarantee anything better that a distribution. Case 3 involves orthogonality, both 
across scales and translations, so there are two rows for Case 3 in the tables involving wavelets. Case 4 adds to the 
orthogonality a condition on the frequency response H(w) or on the eigenvalues of the transition matrix to 
guarantee an L? basis rather than a tight frame guaranteed for Case 3. Cases 5 and 6 concern zero moments and 
scaling function smoothness and symmetry. 


In some cases, columns 3 and 4 are equivalent and others, they are not. In some categories, a higher numbered 
case assumes a lower numbered case and in others, they do not. These tables try to give a structure without the 
details. It is useful to refer to them while reading the earlier chapters and to refer to the earlier chapters to see the 
assumptions and conditions behind these tables. 


Types of Wavelet Systems 


Here we try to present a structured list of the various classes of wavelet systems in terms of modification and 
generalizations of the basic MM = 2 system. There are some classes not included here because the whole subject is 


still an active research area, producing new results daily. However, this list plus the table of contents, index, and 
references will help guide the reader through the maze. The relevant section or chapter is given in parenthesis for 
each topic. 


e Signal Expansions [link] 


o General Expansion Systems [link] 
© Multiresolution Systems [link] 


Multiresolution Wavelet Systems [link] 


M = 2 or two-band wavelet systems [link]-[link] 
M > 2 or M-band wavelet systems [link] 
wavelet packet systems [link] 

multiwavelet systems [link] 


o 0 0 0 


¢ Length of scaling function filter [link] 


© Compact support wavelet systems 
o Infinite support wavelet systems 


Orthogonality [link] 


© Orthogonal or Orthonormal wavelet bases 
© Semiorthogonal systems 
© Biorthogonal systems [link] 


e Symmetry 


o Symmetric scaling functions and wavelets [link], [link] 
o Approximately symmetric systems [link] 

o Minimum phase spectral factorization systems [link] 

o General scaling functions 


¢ Complete and Overcomplete systems [link],[{link] 


Frames 

Tight frames 

Redundant systems and transforms [link],[link] 
Adaptive systems and transforms, pursuit methods [link] 


o 0 0 0 


Discrete and continuous signals and transforms {analogous Fourier method} [link] 


o Discrete Wavelet Transform {Fourier series} [link] 
© Discrete-time Wavelet Transform {Discrete Fourier transforms} [link],[link] 
© Continuous-time Wavelet Transform {Fourier transform or integral} [link] 


Wavelet design [link] 


o Max. zero wavelet moments [Daubechies] 
o Max. zero scaling function moments 
o Max. mixture of SF and wavelet moments zero [Coifman] [link] 
o Max. smooth scaling function or wavelet [Heller, Lang, etc.] 
© Min. scaling variation [Gopinath, Odegard, etc.] 
o Frequency domain criteria 
= Butterworth [Daubechies] 
= least-squares, constrained LS, Chebyshev 


° Cosine modulated for M-band systems [link] 
e Descriptions [link] 


o The signal itself 

o The discrete wavelet transform (expansion coefficients) 
o Time functions at various scales or translations 

o Tiling of the time-frequency/scale plane [link] 


Appendix A 


This appendix contains outline proofs and derivations for the theorems and formulas given in 
early part of Chapter: The Scaling Function and Scaling Coefficients, Wavelet and Wavelet 
Coefficients . They are not intended to be complete or formal, but they should be sufficient to 
understand the ideas behind why a result is true and to give some insight into its 
interpretation as well as to indicate assumptions and restrictions. 


Proof 1 The conditions given by [link] and [link] can be derived by integrating both sides of 
Equation: 


=e) ) VM y(Ma—n) 


and making the change of variables y = Mz 
Equation: 


[e(a)de = n(n) [ VM 9 (Me —n) do 


and noting the integral is independent of translation which gives 
Equation: 


=e vm f v( y)a> dy. 


With no further requirements other than y € L’ to allow the sum and integral interchange 
and f o(z) dx + 0, this gives [link] as 
Equation: 


So h(n) = 


and for M = 2 gives [link]. Note this does not assume orthogonality nor any specific 
normalization of y(t) and does not even assume M is an integer. 


This is the most basic necessary condition for the existence of y(t) and it has the fewest 
assumptions or restrictions. 


Proof 2 The conditions in [link] and [link] are a down-sampled orthogonality of translates 
by M of the coefficients which results from the orthogonality of translates of the scaling 
function given by 

Equation: 


OO eae eal, 


in [link]. The basic scaling equation [link] is substituted for both functions in [link] giving 
Equation: 


[[oae ) VM 9 ( Mzx—-n | Dae k) VM 9 ( Mz — Mm — i)| ae = BG) 


which, after reordering and a change of variable y = M z, gives 
Equation: 


a) A) fe y—n) ely Mm — Bb dy = £5(m), 
n k 


Using the orthogonality in [link] gives our result 


Equation: 
S" h(n) h(n— Mm) = 6(m) 


in [link] and [link]. This result requires the orthogonality condition [link], M must be an 
integer, and any non-zero normalization E may be used. 


Proof 3 (Corollary 2) The result that 
Equation: 


Sh (2n) = So h(2n+1) =1/V2 


in [link] or, more generally 
Equation: 


So h(Mn) =S>h(Mn+k) =1/VM 


is obtained by breaking [link] for M = 2 into the sum of the even and odd coefficients. 
Equation: 


So h(n) = Sh (2k) + SOA (2k4+1) = Ky + Ki = V2. 
n k k 


Next we use [link] and sum over n to give 
Equation: 


So Soh (k+ 2n)h(k) =1 
n k 


which we then split into even and odd sums and reorder to give: 
Equation: 


Solving [link] and [link] simultaneously gives Ky = Ky, = 1//2 and our result [link] or 
[link] for M = 2. 


If the same approach is taken with [link] and [link] for M = 3, we have 
Equation: 


S > a(n) = Soa (3n)+ S$ >2(3n+1) + S >a (3n+2) = v3 


n n 


which, in terms of the partial sums K;, is 
Equation: 


So a(n) = Ko+ Kit Kp = V3. 


Using the orthogonality condition [link] as was done in [link] and [link] gives 
Equation: 


Kg 4+ K?4+ K3=1. 


Equation [link] and [link] are simultaneously true if and only if Ko = K, = Ky = 1/V3. 
This process is valid for any integer M and any non-zero normalization. 


Proof 3 If the support of p(x) is |0, N — 1], from the basic recursion equation with support 
of h(n) assumed as |N;, No] we have 
Equation: 


N2 


e (0) = So h(n) V2e Qe —n) 


n=N, 


where the support of the right hand side of [link] is [Ni /2, (N — 1+ N2)/2). Since the 
support of both sides of [link] must be the same, the limits on the sum, or, the limits on the 
indices of the non zero h(n) are such that N, = 0 and Nz = N, therefore, the support of 
h(n) is [{0, N — 1]. 


Proof 4 First define the autocorrelation function 


| a(0) = f(s) o(2-2) a 


and the power spectrum 
Equation: 


A(w)= fawe*a=f [o(e)p(e-s)az eit at 


which after changing variables, y = x — t, and reordering operations gives 
Equation: 


AWw)= fo(eje ae foyer ay 


Equation: 


If we look at [link] as being the inverse Fourier transform of [link] and sample a(t) at t = k 
, we have 
Equation: 


Equation: 


27 21 
= =>/ |B (w + 2rl)|? eF* dw = a me (04290)? EFF dus 
0 


20 0 ? 


but this integral is the form of an inverse discrete-time Fourier transform (DTFT) which 
means 
Equation: 


S° a(k) k) eek — ae (w + 2n€)|?. 


If the integer translates of y(t) are orthogonal, a(k) = 6(k) and we have our result 
Equation: 


> |B (w + 2n£)|? = 
£ 


If the scaling function is not normalized 
Equation: 


d,1F( (w +278)? = five )|? dt 


which is similar to Parseval's theorem relating the energy in the frequency domain to the 
energy in the time domain. 


Proof 6 Equation [link] states a very interesting property of the frequency response of an 
FIR filter with the scaling coefficients as filter coefficients. This result can be derived in the 
frequency or time domain. We will show the frequency domain argument. The scaling 
equation [link] becomes [link] in the frequency domain. Taking the squared magnitude of 
both sides of a scaled version of 

Equation: 


&(w) = — 7 H (w/2) & (w/2) 


gives 
Equation: 
IB (2u) |? = S| (w) |? |B (w)/? 


Add kr to w and sum over k to give for the left side of [link] 
Equation: 


N° |S (Qw + 2nk)|? = K =1 
k 


which is unity from [link]. Summing the right side of [link] gives 
Equation: 


2 S/H wt he) | (w+ ke)? 
k 


Break this sum into a sum of the even and odd indexed terms. 
Equation: 


1 1 
S- zlB (w+ 2nk)|” |B (w + 2ek)|? + S~ lH wt (2k + 1)m)|? | (w + (2k + 1)n)|? 
k k 
Equation: 


= $|H(w PL ee + nk)? + > | (w+m)/? S16 (w+ (2k + 1)m)P 
k 


which after using [link] gives 
Equation: 


1 1 
= S/H) + 5|H(wtmP=1 


which gives [link]. This requires both the scaling and orthogonal relations but no specific 
normalization of y(t). If viewed as an FIR filter, h(n) is called a quadrature mirror filter 
(QMF) because of the symmetry of its frequency response about 7. 


Proof 10 The multiresolution assumptions in [link] require the scaling function and wavelet 
satisfy [link] and [link] 
Equation: 


= 2h) ) V2y (2t — n), =a) ) V2 (2t — n) 


and orthonormality requires 
Equation: 


Jo(ele-#) anal) 


and 
Equation: 


[oleae 


for all k € Z. Substituting [link] into [link] gives 
Equation: 


[Xm (n) V2~(2t—n) S>h(é) V2 (2t — 2k — £) dt =0 
n L 


Rearranging and making a change of variables gives 
Equation: 


Tr n)h(O > f eu-n)ey-2k-Hdy=0 
ne 


Using [link] gives 
Equation: 


S "hi (n) h(£) 6(n — 2k — £) =0 


nt 


for all k € Z. Summing over £ gives 
Equation: 


S "hi (n) h(n — 2k) =0 


Separating [link] into even and odd indices gives 
Equation: 


S © hi (2m) h (2m — 2k) + $7 hy (20 + 1) h(2L4+1— 2k) =0 
m £L 


which must be true for all integer k. Defining he (n) = h(2n), ho (n) = h(2n + 1) and 
g(n) = g(—n) for any sequence g, this becomes 
Equation: 


he % Mie tho * hip = 0. 


From the orthonormality of the translates of yy and w one can similarly obtain the following: 
Equation: 


hex het le hile =. 


Equation: 


Ate Ww Ate P Ato Ww Ato = 6. 


This can be compactly represented as 


he hig he h e 6 0 ; 


Assuming the sequences are finite length [link] can be used to show that 
Equation: 


he %& Ay — ho & Aye = EO ky 


where 6; (n) = 6(n — k). Indeed, taking the Z-transform of [link] we get using the notation 
of Chapter: Filter Banks and Transmultiplexers Hy (2H (27 = I. Because, the filters 
are FIR H, (z) is a (Laurent) polynomial matrix with a polynomial matrix inverse. Therefore 
the determinant of H, (z) is of the form +z* for some integer k. This is equivalent to [link]. 


Now, convolving both sides of [link] by Re we get 
Equation: 


th. x 6p = [he %& Ato — ho Ae] H he 
= [he 3 he ¥ hig — hie % hee 3 ho| 


[he te Re st Ray + Pg % Ro ¥ ho| 


[he te Re thy x i tt hig 
Iie. 


Similarly by convolving both sides of [link] by ho we get 
Equation: 


Fh, Ww Ox = Ate. 
Combining [link] and [link] gives the result 
Equation: 
hi@\=i(-1) ais 1-28 
Proof 11 We show the integral of the wavelet is zero by integrating both sides of ([link]b) 


gives 
Equation: 


[e@a=Smy f v2e@e—mat 


But the integral on the right hand side is Ao, usually normalized to one and from [link] or 
[link] and [link] we know that 
Equation: 


Shi (n) = 0 


and, therefore, from [link], the integral of the wavelet is zero. 


The fact that multiplying in the time domain by (—1)" is equivalent to shifting in the 
frequency domain by x gives H, (w) = H (w+ 7). 


Appendix B 
In this appendix we develop most of the results on scaling functions, wavelets 
and scaling and wavelet coefficients presented in [link] and elsewhere. For 


convenience, we repeat [link], [link], [link], and [link] here 
Equation: 


ee) ) V2 y (2t — n) 


Equation: 


Equation: 


fe (t) p(t —k)dt= Eé(k) ={E ifk=00 otherwise 


If normalized 
Equation: 


[¢ (t)yp(t—k)dt=d(k)={1 ifk=00 otherwise 


The results in this appendix refer to equations in the text written in bold face 
fonts. 


Equation [link] is the normalization of [link] and part of the orthonormal 
conditions required by [link] for k = 0 and & = 1. 


Equation [link] If the (xz — k) are orthogonal, [link] states 


| filvicbgienals 


Summing both sides over m gives 
Equation: 


D [e+ me(e)dz == 


which after reordering is 
Equation: 


[e@De(e+m)de=E. 


m 


Using [link], [link], and [link] gives 
Equation: 


[e(o)dz Ao =F 


but fy (x) dx = Ao from [link], therefore 
Equation: 


2 
If the scaling function is not normalized to unity, one can show the more 
general result of [link]. This is done by noting that a more general form of 


[link] is 
Equation: 


Yete+m)= [yar 


m 


if one does not normalize Ap = 1 in [link] through [link]. 


Equation [link] follows from summing [link] over m as 
Equation: 


~ fee Jo(2) dz = f (2)? de 


which after reordering gives 
Equation: 


[e@)Xele+mae= f ole)? ax 


and using [link] gives [link]. 


Equation [link] is derived by applying the basic recursion equation to its own 
right hand side to give 
Equation: 


=S*a(n Bee k) V2 (2 (2t — n) — k) 


which, with a change of variables of 2 = 2n + k and reordering of operation, 
becomes 
Equation: 


=D [Lae ae-a0) 2 (4t — £). 
2 n 


Applying this 7 times gives the result in [link]. A similar result can be derived 
for the wavelet. 


Equation [link] is derived by defining the sum 
Equation: 


A= (37) 


and using the basic recursive equation [link] to give 
Equation: 


Ar= 2h) V20 (25 -n). 


Interchanging the order of summation gives 
Equation: 


4-8) De (as ‘)} 


but the summation over @ is independent of an integer shift so that using [link] 
and [link] gives 
Equation: 


Ajy=vV2V2 Dae{ (sex) } =2Ajy-4. 
n £ 


This is the linear difference equation 
Equation: 


A j—2 A 721;=0 
which has as a solution the geometric sequence 
Equation: 


Aj; = Ao2?’. 


If the limit exists, equation [link] divided by 2/ is the Riemann sum whose 
limit is the definition of the Riemann integral of y(z) 
Equation: 


. i) 
Jim {Ar5z } = [ o(e)de = Ao. 


It is stated in [link] and shown in [link] that if y(a) is normalized, then 
Ap = 1 and [link] becomes 
Equation: 


A; = 2’. 


which gives [link]. 


Equation [link] shows another remarkable property of y(z) in that the 
bracketed term is exactly equal to the integral, independent of J. No limit 
need be taken! 


Equation [link] is the “partitioning of unity" by y(z). It follows from [link] 
by setting J = 0. 


Equation [link] is generalization of [link] by noting that the sum in [link] is 
independent of a shift of the form 
Equation: 


for any integers M > J and L. In the limit as M — ov, — can be made 
arbitrarily close to any 2, therefore, if y(a) is continuous, 
Equation: 


Ye (sr -*) — 27, 


This gives [link] and becomes [link] for J = 0. Equation [link] is called a 
“partitioning of unity" for obvious reasons. 


The first four relationships for the scaling function hold in a generalized form 
for the more general defining equation [link]. Only [link] is different. It 


becomes 
Equation: 


Sv (saz) 3” 


for MM an integer. It may be possible to show that certain rational M are 
allowed. 


Equations [link], [link], and [link] are the recursive relationship for the 
Fourier transform of the scaling function and are obtained by simply taking 


the transform [link] of both sides of [link] giving 
Equation: 


P (w) = > h(n) [20 (2t — n)e dt 


which after the change of variables y = 2¢ — n becomes 
Equation: 


5) = 2 Orem [owe Horr ay 


and using [link] gives 
Equation: 


5 (w) = 7 Shine wn / p (ye #2 dy = eH lu?) 5 (w/2) 


which is [link] and [link]. Applying this recursively gives the infinite product 
[link] which holds for any normalization. 


Equation [link] states that the sum of the squares of samples of the Fourier 
transform of the scaling function is one if the samples are uniform every 27. 
An alternative derivation to that in Appendix A is shown here by taking the 
definition of the Fourier transform of y(x), sampling it every 27k points and 
multiplying it times its complex conjugate. 

Equation: 


GP (w + 2k) P(w + 2k) = [e (x)eHetaak)e dz [+ (y)edet2mk)y dy 


Summing over k gives 


Equation: 
S- |B (w + 2nk)|° = >| f+ (x)y (ye (Ye P79) dx dy 
k k 
Equation: 
— [[e (x)y (y)ey-2) S- el2tk(y—2) dp dy 
k 
Equation: 
= [fe (x) (a + ze” > ek? de dz 
k 

but 

Equation: 


eo = 2G — f) 
k 


therefore 


Equation: 


which becomes 
Equation: 


x e+) dx ei 
y fe@ee+o 


Because of the orthogonality of integer translates of y(x), this is not a 
function of w but is {| (x)|” da which, if normalized, is unity as stated in 
[link]. This is the frequency domain equivalent of [link]. 

Equations [link] and [link] show how the scaling function determines the 
equation coefficients. This is derived by multiplying both sides of [link] by 
(2x — m) and integrating to give 

Equation: 


fe (z)p (22 — m) dx = [> (n)p (2a — n)p (2a — m) dx 
Equation: 


= zen) f oe — nel —m) dx. 


Using the orthogonality condition [link] gives 
Equation: 


1 a or 
[2 (@)e (2x —m)de = hm — | Io) dy = eh (mn) 


which gives [link]. A similar argument gives [link]. 


Appendix C 


You are free to use these programs or any derivative of them for any 
scientific purpose but please reference this book. Up-dated versions of these 
programs and others can be found on our web page at: http: //www- 
dsp.rice.edu/ 


function p = psa(h, kk) 

% p = psa(h, kk) calculates samples of the 
scaling function 

% phi(t) = p by kk _= Successive approximations 
from the 

% scaling coefficients h. Initial iteration is 
a constant. 


% phi_k(t) is plotted at each iteration. csb 
5/19/93 
% 
if nargin==1, kk=11; end; % Default number 
of iterations 
h2= h*2/sum(h); % normalize h(n) 
K = length(h2)-1; S = 128; % Sets sample 
density 
p = [ones(1,3*S*K),0]/(3*K); % Sets initial 
iteration 
P = p(1:K*S); % Store for later 
plotting 
axis([0 K*S+2 -.5 1.4]); 
hu = upsam(h2,S); % upsample h(n) 
by S$ 
for iter = 0:kk % Successive 
approx. 

p = dnsample(conv(hu,p)); % convolve and 
down-sample 

plot(p); pause; % plot each 
iteration 
%  P = [P;p(1:K*S)]; % store each 


iter. for plotting 


end 

p = p(1i:K*S); % only the 
Supported part 

L = length(p); 

x = ([1:L])/(S); 

axis([0 3 -.5 1.4]); 

plot(x,p); % Final plot 
title('Scaling Function by Successive Approx.'); 
ylabel( 'Scaling Function'); 

Xlabel('x'); 


function p = pdyad(h, kk) 

% p = pdyad(h,kk) calculates approx. (L- 
1)*24(kk+2) samples of the 

% scaling function phi(t) = p by kk+3 dyadic 
expansions 

% from the scaling coefficient vector h_ where 
L=length(h). 


% Also plots phi_k(t) at each iteration. csb 
5/19/93 

% 

if nargin==1, kk = 8; end % Default 
iterations 

h2 = h*2/sum(h); % 
Normalize 


N = length(h2); hr = h2(N:-1:1); hh = h2; 
axis([0,N-1,-.5,1.4]); 

MR = [hr,zeros(1, 2*N-2)]; % 
Generater row for MO 

MT = MR; MO = []; 


for k = 1:N-1 % Generate 
convolution and 
MR = [0, ©, MR(1:3*N-4)]; % 


downsample matrix from h(n) 
MT = [MT; MR]; 
end 
MO = MT(:,N:2*N-1); % MO*p = p 


if p samples of phi 

MI = MO - eye(N); 

MJ = [MI(1:N-1,:);ones(1,N)]; 

pp = MJ\[zeros(N-1,1);1]; % Samples 


of phi at integers 
p = pp(2:length(pp)-1).'; 

= [O:length(p)+1]*(N-1)/(length(p)+1); 
plot(x,[0,p,0]); pause 
p = conv(h2,p); % value on 
half integers 

= [0:length(p)+1]*(N-1)/7(length(p)+1); 
plot(x,[0,p,0]); pause 


y = conv(h2,dnsample(p)); % convolve 
and downsample 
p = merge(y,p); % 


elle aig values on Z and Z/2 
[O:length(p)+1]*(N-1)/(Length(p)+1); 
ae me oe pause 


for k = 

hh = Tocanptecainee % upsample 
coefficients 

y = conv(hh,y); % 
calculate intermediate terms 

p = merge(y,p); % insert 


new terms between old 
= [0:length(p)+1]*(N-1)/(length(p)+1); 
plot(x,[0,p,9]); pause; 
end 
title('Scaling Function by Dyadic Expansion'); 
ylabel( 'Scaling Function'); 
Xlabel('x'); 
axis; 


function [hf,ht] = pf(h, kk) 

% [hf,ht] = pf(h,kk) computes and plots hf, the 
Fourier transform 

% of the scaling function phi(t) using the freq 


domain 

% infinite product formulation with kk iterations 
from the scaling 

% function coefficients h. Also calculates and 
plots ht = phi(t) 


% uSing the inverse FFT csb 5/19/93 
if nargin==1, kk=8; end % Default 
iterations 
L = 2A12; P = L; % Sets number of 
sample points 
hp = fft(h,l); hf = hp; % Initializes 
iteration 
plot(abs(hf));pause; % Plots first 
iteration 
for k = 1:kk % Iterations 

hp = [hp(1:2:L), hp(1:2:L)]; % Sample 

hf = hf.*hp/sqrt(2); % Product 


plot(abs(hf(1:P/2)));pause; % Plot Phi(omega) 
each iteration 


P=P/2; % Scales axis for 
plot 
end; 
ht = real(ifft(hf)); % phi(t) from 


inverse FFT 
ht = ht(1:8*2%kk); plot(ht(1:6*24kk)); % Plot 
phi(t) 


function hn = daub(N2) 

% hn = daub(N2) 

% Function to compute the Daubechies scaling 
coefficients from 

% her development in the paper, "Orthonormal 
bases of compactly 

% Supported wavelets", CPAM, Nov. 1988 page 977, 
or in her book 

% "Ten Lectures on Wavelets", SIAM, 1992 pages 
168, 216. 


% The polynomial R in the reference is set to 
zero and the 

% minimum phase factorization is used. 

% Not accruate for N > 20. Check results for 
long h(n). 

% Input: N2 = N/2, where N is the length of 
the filter. 

% Output: hn = h(n) Length-N min phase scaling 
fn coefficients 

% by rag 10/10/88, csb 3/23/93 


a =1) p = 1; q=1; % Initialization of 
variables 
hn = [1 1]; % Initialize 


factors of zeros at -1 
for j = 1:N2-1, 


hn = conv(hn,[1,1]); % Generate 
polynomial for zeros at -1 

a = -a*0.25*(j+N2-1)/J; % Generate the 
binomial coeff. of L 

p = conv(p,[1,-2,1]); % Generate variable 
values for L 

q = [0 g O|] + a*p; % Combine terms for 
L 
end; 
q = sort(roots(q)); % Factor L 


hn = conv(hn, real(poly(q(1:N2-1)))); % Combine 
zeros at -1 and L 

hn = hn*sqrt(2)/(sum(hn)); % 
Normalize 


function h = h246(a,b) 

% h = h246(a,b) generates orthogonal scaling 
function 

% coefficients h(n) for lengths 2, 4, and 6 
using 

% Resnikoff's parameterization with angles a and 
b. 


% csb. 4/4/93 


if a==b, h = [1,1]/sqrt(2); % Length-2 
elseif b== 
ho (1 - cos(a) + sin(a))/2; % Length-4 
h1 (1 + cos(a) + sin(a))/2; 


=) 

NO 
WoW ot a ou 
— 
my 


+ cos(a) - sin(a))/2; 
h3 (1 - cos(a) - Ssin(a))/2; 
h [hO hi h2 h3]/sqrt(2); 


else % Length-6 
hO = ((1+cos(a)+sin(a) )*(1-cos(b) - 
Sin(b))+2*sin(b)*cos(a))/4; 
hl = ((1-cos(a)+sin(a) )*(1+cos(b) - 
Sin(b))-2*sin(b)*cos(a) )/4; 


h2 = (1+cos(a-b)+sin(a-b))/2; 

h3 = (1+cos(a-b)-sin(a-b))/2; 

h4 = (1-h0-h2); 

h5 = (1-h1-h3); 

h = [hO ht h2 h3 h4 h5]/sqrt(2); 
end 


function [a,b] = ab(h) 

% [a,b] = ab(h) calculates the parameters a and b 
from the 

% scaling function coefficient vector h for 


orthogonal 

% systems of length 2, 4, or 6 only. csb. 
5/19/93. 

% 

h = h*2/sum(h); x=0; % 
normalization 


if length(h)==2, h [0 0 h © OJ]; x=2; end; 
if length(h)==4, h [0 h O]; x=4; end; 
a = atan2((2*(h(1)42+h(2)42-1)+h(3)+h(4)), (2*h(2)* 
ee 1)-2*h(1)*(h(4)-1))); 
=a- atan2((h(3)- h(4)),(h(3)+h(4)-1)); 
- X==2, a=1; b= 1; end; 
if x==4, b = 0: end; 


function y = upsample(x) 

% y = upsample(x) inserts zeros between each term 
in the row vector x. 

% for example: [1 0 2 0 3 O] = upsample([1 2 
3]). csb 3/1/93. 

L = length(x); 

y(:) = [x;zeros(1,L)]; y=y.'; 

Y= yi 2h bel); 


function y = upsam(x,S) 

% y = upsam(x,S) inserts S-1 zeros between each 
term in the row vector x. 

% for example: [1 0 2 0 3 O] = upsample([1 2 
3]). csb 3/1/93. 

L = length(x); 

y(:) = [x,;zeros(S-1,L)]; y=y.'; 

y = y(1:S*L-1); 


function y = dnsample(x) 

% y = dnsample(x) samples x by removing the even 
terms in x. 

% for example: [1 3] = dnsample([1 2 3 4]). 
csb 3/1/93. 

L = length(x); 

y X(122°L); 


function z = merge(x,y) 
% Z = merge(x,y) interleaves the two vectors x and 


% Example [1 2 3 4 5] = merge([1 3 5],[2 4]). 
% csb 3/1/93. 


z= [x;y,0]; 
ZS Z25))4 
Z = z(i:length(z)-1).'; 


function w = wave(p,h) 
% W = wave(jp,h) calculates and plots the wavelet 


psi(t) 

% from the scaling function p= and the scaling 
function 

% coefficient vector h. 

% It uses the definition of the wavelet. csb. 
5/19/93. 

% 

h2 = h*2/sum(h); 

NN = length(h2); LL = length(p); KK = 
round((LL)/(NN-1) ); 

hiu = upsam(h2(NN:-1:1).*cos(pi*[O:NN-1]),KK); 
w = dnsample(conv(hiu,p)); w = w(1:LL); 

Xx = [O:LL-1]*(NN-1)/(LL-1); 

axis([1 2 3 4]); axis; 

plot(xx,w); 


function g = dwt(f,h,NJ) 

% function g = dwt(f,h,NJ); Calculates the DWT of 
periodic g 

% with scaling filter h and NJ scales. rag 
& csb 3/17/94. 

% 

N ength(h); L = length(f); 

C ; te=C]; 

if nargin==2, NJ = round(log10(L)/log10(2)); end; 
% Number of scales 


=a 
=F 


ho = fliplr(h); % 
Scaling filter 
ht = h; hi(i:2:N) = -hi(1:2:N); % 
Wavelet filter 
for j = 1:NJ % 


Mallat's algorithm 

L = length(c); 

c = [c(mod((-(N-1):-1),L)+1) c]; % Make 
periodic 

d = conv(c,h1); d = d(N:2:(N+L-2)); % 
Convolve & d-sample 


Cc = conv(c,hO0); C = c(N:2:(N+L-2)); % 


Convolve & d-sample 


t = [d,t]; % 
Concatenate wlet coeffs. 
end; 
g = [c,t]; % The 
DWT 


function f = idwt(g,h,NJ) 


% function f = idwt(g,h,NJ); Calculates the IDWT 


of periodic g 

% with scaling filter h and NJ scales. 
& csb 3/17/94. 

% 

L = length(g); N = length(h); 


rag 


if nargin==2, NJ = round(log10(L)/1log10(2)); end; 


% Number of scales 
hO = h; 
Scaling filter 
hi = flipir(h); hit(2:2:N) = -h1(2:2:N); 
Wavelet filter 
LJ = L/(24NJ); 
Number of SF coeffs. 
Cc = g(1:LJ); 
Scaling coeffs. 
for j = 1:NJ 
Mallat's algorithm 
w = mod(0:N/2-1,LJ)+1; 
periodic 
d = g(LJ+1:2*LJ); 
Wavelet coeffs. 
cu(1:2:2*LJ+N) = [c c(1,w)]; 
sample & periodic 
du(1:2:2*LJ+N) = [d d(1,w)]; 
sample & periodic 
Cc = conv(cu,h0O) + conv(du,h1); 
Convolve & combine 


% 


% 


% 


Make 


Up- 


Up- 


C = C(N:N+2*LJ-1); 
Periodic part 
LJ = 2*LJ; 
end; 
| ea Os 
inverse DWT 


function r = mod(m,n) 


% yr = mod(m,n) calculates r = m modulo n 


% 
r =m - n*floor(m/n); 
Matrix modulo n 


function g = dwt5(f,h,NJ) 
% function g = dwt5(f,h,NJ) 


% 


% The 


% 


% Program to calculate the DWT from the L samples 


of f(t) in 


% the vector f using the scaling filter h(n). 


% csb 3/20/94 
% 
N mpeg 
c= f; LI; 
if ee » 
NJ = round(log10(L)/1log10(2) ); 
of scales 
end; 
ht = h; hi(i:2:N) = -hi(1:2:N); 
Wavelet filter 
ho = flipir(h); 
Scaling filter 
for j = 1:NJ 
Mallat's algorithm 
L = length(c); 


d = conv(c,ht1); 
Convolve 
Cc = conv(c,hO0); 


Convolve 


% Number 


% 


% 


% 


% 


% 


Lc = length(c); 


while Lc > 2*L % Multi- 
wrap? 
d = [(d(1:L) + d(L+1:2*L)), d(2*L+1:Lc)]; % 
Wrap output 
Gr =. CECL IL) 4 eC E+: 2"); .6(2* bel ke) | % 
Wrap output 
Lc = length(c); 
end 
d = [(d(1:N-1) + d(Lt+1:Lc)), d(N:L)]; % Wrap 
output 
d = d(1:2:L); % Down - 
sample wlets coeffs. 
c = [(c(1:N-1) + c(Lt+1:Lc)), c(N:L)]; % Wrap 
output 
Cc = c(1:2:L); % Down - 
sample scaling fn c. 
t = [d,t]; % 
Concatenate wlet coeffs. 
end % Finish 
wavelet part 
g = [c,t]; % Add 
scaling fn coeff. 
function a = choose(n,k) 
% a = choose(n,k) 
% BINOMIAL COEFFICIENTS 
% allowable inputs: 
% n : integer, k : integer 
% n : integer vector, k : integer 
% n : integer, k : integer vector 
% n : integer vector, k : integer vector (of 
equal dimension) 
nv =n; 
kv = k; 


if (length(nv) == 1) & (Length(kv) > 1) 
nv = nv * ones(size(kv)); 


elseif (length(nv) > 1) & (Llength(kv) == 1) 
kv = kv * ones(size(nv)); 


end 
a=nv; 
for 1 = 1:length(nv) 
n = nv(i); 
k = kv(1); 
if n >= 0 
if k >= 0 
if n >= k 
C = prod(1:n)/(prod(1:k)*prod(i:n-k)); 
else 
c= 0; 
end 
else 
c= 0; 
end 
else 
if k >= 0 


Cc = (-1)4kK * prod(1:k-n- 
1)/(prod(1:k)*prod(1:-n-1)); 
else 
if n >= k 
Cc = (-1)4(n-k)*prod(1:-k-1)/(prod(1:n- 
k)*prod(1:-n-1)); 


else 
c= 0; 
end 
end 
end 
a(i) = C; 
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Bibliography 


In 1998 we especially recommended five books that complement this one. 
An excellent reference for the history, philosophy, and overview of wavelet 
analysis has been written by Barbara Burke Hubbard [link]. The best source 
for the mathematical details of wavelet theory is by Ingrid Daubechies 
[link]. Two good general books which starts with the discrete-time wavelet 
series and filter bank methods are by Martin Vetterli and Jelena Kovacevic 
[link] and by Gilbert Strang and Truong Nguyen [link]. P. P. Vaiyanathan 
has written a good book on general multirate systems as well as filter banks 
[link]. 


Much of the recent interest in compactly supported wavelets was stimulated 
by Daubechies [link], [link], [link], [link] and S. Mallat [link], [link], [link] 
and others [link], [link]. A powerful point of view has been recently 
presented by D. L. Donoho, I. M. Johnstone, R. R. Coifman, and others 
[link], [link], [link], [link], [link], [link], [link], [link], [link], [link]. The 
development in the DSP community using filters has come from Smith and 
Barnwell [link], [link], Vetterli [link], [link], [link], [link], and 
Vaidyanathan [link], [link], [link]. Some of the work at Rice is reported in 
[link], [link], [link], [link], [link], [link], [link], [link], [link], [link], [link] 
[link], [link], [link], [link], [link] Analysis and experimental work was done 
using the Matlab computer software system [link], [link]. Overview and 
introductory articles can be found in [link], [link], [link], [link], [link], 
[link], [link], [link], [link], [link], [link]. [link], [link], [link], [link], [link] 
Two special issues of IEEE Transactions have focused on wavelet methods 
[link], [link]. Books on wavelets, some of which are edited conference 
proceedings include [Link], [link], [link], [link], [link], [link], [link], [link], 
[link], [link], [link], [link], [link][link], [link], [link], [link], [link], [link], 
[link], [link], [link], [link][link], [link], [link], [link], [link], [link], [link], 
[link], [link], [link][link], [link], [link], [link], [link], [Link]. 


In this 2015 revision, we add several new references. An excellent 
collection of basic wavelet research papers has been published by Heil and 


Walnut [link], a very good modern signal procession book which is also 
available online is written by Kovacevic, Goyal, and Vetterli [link]. 
Stéphane Mallat has written a comprehensive third revised edition of his 
book on Wavelets [link]. New work on lifting can be found in [link], [link], 
a general guide in [link], and book on Frames [link] and a new book on 
sampling [link]. 


Another way to keep up with current research and results on wavelets is to 
read the Wavelet Digest on the world-wide-web at: http://www. wavelet.org/ 
or the Rice DSP site at http://dsp.rice.edu/software 
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